JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DC-RDA Archives


DC-RDA Archives

DC-RDA Archives


DC-RDA@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DC-RDA Home

DC-RDA Home

DC-RDA  February 2009

DC-RDA February 2009

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: datasets for testing rda at scale

From:

Karen Coyle <[log in to unmask]>

Reply-To:

List for discussion on Resource Description and Access (RDA)

Date:

Fri, 13 Feb 2009 06:46:37 -0800

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (124 lines)

Alistair,

I did start an analysis of RDA and MARC, but didn't get very far. I'll 
take that up again. What I was mainly finding is that there are a lot of 
RDA elements that are listed for more than one MARC element, e.g.

$a Personal name* = 9.2.2 Preferred Name for the Person*
$b Numeration = *9.2.2 Preferred Name for the Person

There are ones that go the other way, as well, where RDA is more 
specific than MARC. It made me wonder how it is that we use the specific 
MARC elements: are they needed for display? do they help input? are they 
arbitrary?

I haven't looked at MODS, however, and there isn't a mapping provided 
between MODS and RDA. I'll think about that, however.

kc

*Alistair Miles wrote:
> Hi all,
>
> This is just an update to say that I've converted the LOC/scriblio
> data to marc xml and from there to mods xml. My next step is to do
> some analysis of the loc data in mods xml to get an overview of the
> elements used, then to try to design at least a partial mapping from
> mods xml to RDF using the RDA and FRBR schemas.
>
> FYI the marc xml and mods xml versions of the LOC/scriblio data can be
> downloaded from the links below...
>
> http://dcmi-rda.s3.amazonaws.com/locdata/part01-marcxml.tar.gz
> http://dcmi-rda.s3.amazonaws.com/locdata/part01-modsxml.tar.gz
> http://dcmi-rda.s3.amazonaws.com/locdata/part02-marcxml.tar.gz
> http://dcmi-rda.s3.amazonaws.com/locdata/part02-modsxml.tar.gz
> [...]
> http://dcmi-rda.s3.amazonaws.com/locdata/part29-marcxml.tar.gz
> http://dcmi-rda.s3.amazonaws.com/locdata/part29-modsxml.tar.gz
>
> Each download is a gzipped tar containing a *set* of up to 25 xml
> files. Each of these files is a 10,000 record split of the data in the
> corresponding part. I broke each part into 10,000 record splits so I
> could process the transformations more easily.
>
> N.B. there is a bug in part 13 split 25, for some reason the marc xml
> output was incomplete so up to 10,000 records could be missing.
>
> FWIW I initially tried the conversions without splitting each
> part. I.e. I converted each original marc file into a single marc xml
> file, then tried to transform that to a mods xml file via
> xsltproc. However I found you need more than 7GB ram to do the marcxml
> to modsxml transform on a whole part (I tried it on a large ec2
> instance), so that's when I decided to split each part into smaller
> chunks, which I figured would be faster to process and more amenable
> to parallel processing (transforming all the splits from marcxml to
> modsxml took a couple of hours on a c1.xlarge ec2 instance, running up
> to 10 transformations in parallel; it can also be done on a laptop,
> but takes ~10 times longer).
>
> Btw if anyone else has experience of the marcxml->modsxml transform on
> a file of similar size do let me know, I don't do a lot of xslt-ing so
> may be missing some tricks for making it work on smaller computers.
>
> Cheers,
>
> Alistair
>
>
> On Mon, Dec 22, 2008 at 03:31:50PM -0500, Ed Summers wrote:
>   
>> Hey Alistair:
>>
>> On Mon, Dec 22, 2008 at 1:16 PM, Alistair Miles
>> <[log in to unmask]> wrote:
>>     
>>> Any tips for how I could turn these data into RDF?
>>>       
>> If you want to work specifically with that dataset you could download
>> the different parts Karen pointed you to, and convert to MARCXML using
>> an efficient tool like yaz-marcdump [2]. yaz-marcdump is nice it will
>> convert from MARC-8 to UTF-8.
>>
>> Once you've got it in MARCXML you could then use a stylesheet like
>> LC's [2] to convert to DublinCore flavored RDF. This might be kinda
>> lossy for your RDA work though, so you might want MARCXML->MODS [3],
>> and then use the MODS->RDF conversion that the Simile folks created
>> (which Karen also pointed you to) [4].
>>
>> In fact Simile used that stylesheet on their own MIT Library Catalog
>> MARC data (Barton) and still seem to have the result online [5]. So
>> perhaps just using the Barton data is the quickest way to begin
>> playing with what once was MARC data as RDF? To my knowledge Stefano
>> Mazzocchi simply created an RDF vocabulary that mirrors the  MODS XML
>> Schema, but I haven't looked at it in a while.
>>
>> Another thing worth checking out might be Rob Styles work [6] with
>> other people at Talis at converting MARC with full fidelity to RDF.
>> Perhaps he has some tools (or data) at his disposal? Rob you are on
>> here right?
>>
>> I'd be willing to lend a hand with some of this if necessary, so just
>> let me know if you think I can help.
>>
>> //Ed
>>
>> [1] http://www.indexdata.com/yaz/doc/yaz-marcdump.tkl
>> [2] http://www.loc.gov/standards/marcxml/xslt/MARC21slim2RDFDC.xsl
>> [3] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl
>> [4] http://simile.mit.edu/wiki/MARC/MODS_RDFizer
>> [5] http://simile.mit.edu/wiki/Dataset:_Barton
>> [6] http://events.linkeddata.org/ldow2008/papers/02-styles-ayers-semantic-marc.pdf
>>     
>
>   

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
[log in to unmask] http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2021
April 2021
February 2021
November 2020
September 2020
August 2020
July 2020
June 2020
March 2020
February 2020
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
April 2019
February 2019
December 2018
September 2018
July 2018
June 2018
April 2018
December 2017
November 2017
June 2017
December 2016
October 2016
September 2016
August 2016
July 2016
May 2016
April 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
August 2012
July 2012
May 2012
April 2012
March 2012
February 2012
January 2012
October 2011
September 2011
August 2011
June 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
June 2010
February 2010
January 2010
December 2009
November 2009
October 2009
June 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
August 2007
July 2007
June 2007
May 2007
April 2006
February 2006
January 2006
December 2005


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager