I have some questions about the relationship between the TEI XML data that you can download from Perseus, and the (various!) URL schemas that exist on the site. I’m hoping that someone can enlighten me a little. I’m not planning on putting this text through ‘hopper’ but using it as the data in another series of programs, which I will write in Python, Javascript, and html5.
For the sake of an example, I will refer in all cases to Caesar's De Bello Gallico.
The first URL is the CTS catalogue URL on catalog.perseus.org: http://catalog.perseus.org/catalog/urn:cts:latinLit:phi0448.phi001 - There’s three Latin editions and one translation. Let’s pick the T. Rice Holmes Latin text edition, because it looks like it’s the one presented in the main Perseus interface as the ‘Latin Text’. http://catalog.perseus.org/catalog/urn:cts:latinLit:phi0448.phi001.perseus-lat1
Now, on the Perseus website the links are usually of this sort of nature - http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.02.0002 - which for the sake of a label I’ll call the ‘standard’ Perseus URI.
On the right hand side there’s a section Data/Identifiers which has URLs that look more like the CTS ones, e.g.
Text URI - http://data.perseus.org/texts/urn:cts:latinLit:phi0448.phi001.perseus-lat1
Citation URI - http://data.perseus.org/citations/urn:cts:latinLit:phi0448.phi001.perseus-lat1:1.1
Catalog Record URI - http://data.perseus.org/catalog/urn:cts:latinLit:phi0448.phi001.perseus-lat1
The ‘Catalog’ URI redirects to the catalog URI I pasted above.
The ‘Text’ URI redirects to this URI - http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:latinLit:phi0448.phi001.perseus-lat1 - which otherwise looks like the standard Perseus page, but at the start of the text, i.e. B.G. 1.1. If you select a link over on the left to a different part of the text, or the ’next chunk’ arrow, you revert back to the ‘standard’ URI scheme.
The ‘Citation’ URI redirects to, e.g. - http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:latinLit:phi0448.phi001.perseus-lat1:1.2 - I made it link to B.G. 1.2 so you can see it’s linked to the specific chunk of text. This is targetable, apparently, to the smallest chunk of text, e.g. http://data.perseus.org/citations/urn:cts:latinLit:phi0448.phi001.perseus-lat1:1.2.4
Oh, at the bottom of each chunk is a little XML button that points to an old-school Perseus URI, e.g. http://www.perseus.tufts.edu/hopper/xmlchunk?doc=Perseus%3Atext%3A1999.02.0002%3Abook%3D1%3Achapter%3D1 - but I believe we’ve had a conversation about this ‘xmlchunk’ being on the way out.
So far though, this is all logical and great and awesome.
And then I downloaded the XML data.
Inside the XML the data is structured as such (the list is produced with 'tar ztvf hopper-texts-GreekRoman.tar.gz | grep Caesar’);
drwxrwxr-x 0 balmas01 balmas01 0 21 May 2011 Classics/Caesar/
drwxrwxr-x 0 balmas01 balmas01 0 21 May 2011 Classics/Caesar/opensource/
-rwxrwxr-x 0 balmas01 balmas01 1256105 23 May 2011 Classics/Caesar/opensource/ag.caes.bg_eng.xml
-rwxrwxr-x 0 balmas01 balmas01 440451 23 May 2011 Classics/Caesar/opensource/caes.bc_eng.xml
-rwxrwxr-x 0 balmas01 balmas01 468947 23 May 2011 Classics/Caesar/opensource/caes.bg_lat.xml
-rw-rw-r-- 0 balmas01 balmas01 297282 23 May 2011 Classics/Caesar/opensource/caes.bc_lat.xml
-rw-rw-r-- 0 balmas01 balmas01 800730 23 May 2011 Classics/Caesar/opensource/caes.bg_eng.xml
So, there’s nothing in the file name which links it to any URL schema seen so far. Looking in the file, there’s a TEI header with some useful metadata, including a revision history, but nothing giving the CTS catalogue entry or any of the other URIs, not even the old ‘standard’ ones, although plainly, it is the text in question. Similarly, there’s nothing in the CTS catalogue linking it to these files either.
While ‘Caesar’ is pretty straightforward, in that there’s one each of Latin, English and B.G., B.C., in some other texts Perseus has multiple Latin/Greek editions available and multiple translations too. Given I wish my application to possess links back to the Perseus mothership, as it were, is it possible to programatically reconstruct the linkages between the file names in the archive and the CTS catalogue URIs? If I was only dealing with one text, or a handful, perhaps that would be a moot point, as I could construct the catalogue links by hand. But as I want my software tool to work across _all_ texts, I need a programatic method.
Does anyone have one, short of capturing the XML by crawling the website’s URLs? I don’t want to do that, obviously.
Thanks
Scot.
--
Scot Mcphee
Computer Programmer, Classics PhD.
p +61 412 957414
e [log in to unmask]
http://autonomous.org/
|