Hi Hamish,
Perseus focuses on XML now. Eventually all Latin texts in Perseus will
be CTS EPIDOC-XML compliant, but at the moment some are CTS EPIDOC XML
compliant and some are not.
The repository of Latin canonical texts you can find here:
https://github.com/PerseusDL/canonical-latinLit/
If you have a look at the progress file
canonical-latinLit.tracking.json, you will see that Pliny is not yet
compliant:
"urn:cts:latinLit:phi0978.phi001.perseus-lat1": {
"epidoc_compliant": false,
"fully_unicode": true,
"git_repo": "canonical-latinLit",
"has_cts_metadata": false,
"has_cts_refsDecl": false,
"id": "1999.02.0138",
"last_editor": "",
"note": "",
"src": "texts/sdl/Latin/PlinyTheElder/PlinyNH.xml",
"status": "migrated",
"target":
"canonical-latinLit/data/phi0978/phi001/phi0978.phi001.perseus-lat1.xml",
"valid_xml": false
}
You can, however, get the XML from the Perseus webpage directly:
http://www.perseus.tufts.edu/hopper/dltext?doc=Perseus%3Atext%3A1999.02.0138
and then strip the XML tags.
This may seem not straight-forward to you, but especially for tasks
like topic-modelling you will need the text in citable units (e.g. in
the olden days, by manually applying line breaks after each unit).
With CTS and the CTS API you will be able to import the text including
the identifiers for the citable units which will make task like topic
modelling a lot easier.
Here is a first tm result of Thucydides using CTS compliant texts, the
CTS API, and good old morpheus to guesstimate the most likely
dictionary form (to normalise the text a tiny bit more.):
http://thomask81.github.io/Greek_vis/#topic=4&lambda=1&term= (I like
to call topic4 the "sea warfare" topic).
You can find drafts of the R script and libraries I use for topic
modelling Latin, Greek, Arabic, English texts here:
https://github.com/ThomasK81/TopicModellingR
There will be a CTS API based release for an open-source open-data
web-app to topic-model Greek and Latin in due course.
Cheers,
Thomas
Quoting Hamish Cameron <[log in to unmask]>:
> Dear List,
>
>
>
> I'm interested in playing around with some statistical analysis of Pliny the
> Elder using Voyant and perhaps some topic modeling tools. I seem to remember
> being able to download the complete Natural History as a txt file from
> Perseus earlier in the year, but now I can't work out how I did it. Was this
> functionality removed, or am I just missing it today?
>
>
>
> (If anyone has already done such a thing, published or otherwise, I'd also
> be interested to hear about it.)
>
>
>
> Thanks in advance!
>
> Hamish
>
>
>
> _____
>
> Hamish Cameron
>
> Visiting Assistant Professor
>
> University of Cincinnati, Department of Classics
>
> Blegen 301 (P.O.Box 210226)
>
> [log in to unmask]
|