JISCMail - DIGITALCLASSICIST Archives

Some folks on this list know that I’m involved in a project to metrically tag all of Latin and Greek verse. I’m some way into this, but I’ve also been working on producing pos-tagged texts for use in my teaching. I’m at a point where I’d like to make sure the two endeavours can work together. And so a lazy decision (surprise) has come back to haunt me: in my verse texts there is a simple hierarchy: line > speaker > word > syllable. This means that when a spoken syllable contains parts of two words (as often with elision in Greek) I tag only as one word. This isn’t good enough for pos-tagging.

I’m not aware of previous work on this, and I have a solution in mind, but I wonder if someone has in fact dealt with this before. The proposed solution is below, but please bear in mind a couple of things before commenting:

- I’m not aiming for TEI compliance (though I have checked to see if there is a TEI based solution - did I miss one?). I do, however, want to be sure the results can be easily reformatted as TEI compliant by anyone who cares to do so (especially anyone who might want to use the data in a TEI-compliant database).

- I am aiming for maximal structural/semantic clarity. Broadly speaking, tagged items should rely as little as possible on information to be found in other tagged items. When they do so, those other items should be parents/grandparents etc. (e.g. inheriting is OK; “before” and “after” type tags are not).

δ’

ἔ

πος

τοι

</div>

p.s. I’ve thought about trying to adapt this to/from text-to-speech xml (e.g. https://console.bluemix.net/docs/services/text-to-speech/SSML.html#ssml), but am trying to learn how to stop taking on unreasonably large projects.