> This certainly won't be appropriate for every corpus or digital edition,
> but for several epigraphic corpora that I've worked on, it has been.
for small corpora this could be done semiautomatically in the way you
described - for large corpora like 70.000+ latin inscriptions from the
Epigraphic Database Heidelberg (or in the long run all latin
inscriptions in EAGLE) some other workflow is needed. My experience with
the treeTagger parser for latin inscriptions gave some mixed results,
which could be improved, I think, by preprocessing the tokens, and/or by
training the parser not with classical latin texts but inscription
texts; if you then take into account the fact that a not so small number
of those inscriptions follow certain formulaic rules, it should be
possible to get better results for automatic parsing & lemmatization.
Doing this could be a nice little project of its own...
Best,
Frank
|