Print

Print


Hi Hamish,

Perseus focuses on XML now. Eventually all Latin texts in Perseus will  
be CTS EPIDOC-XML compliant, but at the moment some are CTS EPIDOC XML  
compliant and some are not.

The repository of Latin canonical texts you can find here:  
https://github.com/PerseusDL/canonical-latinLit/
If you have a look at the progress file  
canonical-latinLit.tracking.json, you will see that Pliny is not yet  
compliant:
"urn:cts:latinLit:phi0978.phi001.perseus-lat1": {
         "epidoc_compliant": false,
         "fully_unicode": true,
         "git_repo": "canonical-latinLit",
         "has_cts_metadata": false,
         "has_cts_refsDecl": false,
         "id": "1999.02.0138",
         "last_editor": "",
         "note": "",
         "src": "texts/sdl/Latin/PlinyTheElder/PlinyNH.xml",
         "status": "migrated",
         "target":  
"canonical-latinLit/data/phi0978/phi001/phi0978.phi001.perseus-lat1.xml",
         "valid_xml": false
     }

You can, however, get the XML from the Perseus webpage directly:  
http://www.perseus.tufts.edu/hopper/dltext?doc=Perseus%3Atext%3A1999.02.0138  
and then strip the XML tags.

This may seem not straight-forward to you, but especially for tasks  
like topic-modelling you will need the text in citable units (e.g. in  
the olden days, by manually applying line breaks after each unit).  
With CTS and the CTS API you will be able to import the text including  
the identifiers for the citable units which will make task like topic  
modelling a lot easier.

Here is a first tm result of Thucydides using CTS compliant texts, the  
CTS API, and good old morpheus to guesstimate the most likely  
dictionary form (to normalise the text a tiny bit more.):  
http://thomask81.github.io/Greek_vis/#topic=4&lambda=1&term= (I like  
to call topic4 the "sea warfare" topic).

You can find drafts of the R script and libraries I use for topic  
modelling Latin, Greek, Arabic, English texts here:  
https://github.com/ThomasK81/TopicModellingR

There will be a CTS API based release for an open-source open-data  
web-app to topic-model Greek and Latin in due course.

Cheers,

Thomas



Quoting Hamish Cameron <[log in to unmask]>:

> Dear List,
>
>
>
> I'm interested in playing around with some statistical analysis of Pliny the
> Elder using Voyant and perhaps some topic modeling tools. I seem to remember
> being able to download the complete Natural History as a txt file from
> Perseus earlier in the year, but now I can't work out how I did it. Was this
> functionality removed, or am I just missing it today?
>
>
>
> (If anyone has already done such a thing, published or otherwise, I'd also
> be interested to hear about it.)
>
>
>
> Thanks in advance!
>
> Hamish
>
>
>
>   _____
>
> Hamish Cameron
>
> Visiting Assistant Professor
>
> University of Cincinnati, Department of Classics
>
> Blegen 301 (P.O.Box 210226)
>
> [log in to unmask]