JISCMail - DIGITALCLASSICIST Archives

Il 14/01/2016 13:16, Roberto Rosselli Del Turco ha scritto:
> Il 12/01/2016 15:14, Gabriel BODARD ha scritto:
>> I'd be interested to hear what you end up doing, Paolo (and for that
>> matter--what some of the other possibilities on offer are...).
>
> I'd be interested in that as well, thanks for setting up the wiki page
> guys. Also Paolo wrote
>
>  >>> In fact, rather than an on-the-fly parser available to the end user,
>  >>> what I want is to perform lemmatization/morphological analysis once
>  >>> through the text and hard-code the results in my XML/TEI.
>
> Is there any particular reason why one would prefer to hard-code the
> results in the TEI document? At first glance, that doesn't sound so
> appealing when compared to an on-the-fly parser, but maybe I'm missing
> something here.

I'm working on a digital scholarly edition where the text is encoded at 
different layers (graphematic, alphabetic, linguistic).

This model has been theorized by Tito Orlandi (among the others, in the 
book "Informatica testuale", Rome 2010). I'm trying to apply this to a 
real edition.

At the linguistic layer, words should not be identified by strings 
(regularized Latin spelling), but by a combination of lemma and 
morphological analysis (e.g.: the genitive singular of "homo, -inis"), 
regardless of the possible spellings.

So lemmatization and POS (part-of-speech = morphological) tagging is no 
'optional' in my model, but a constituent of the edition. Also, I should 
choose, for each word, the 'right' lemma/POS tagging. Not an easy task.

I'll post more on how my quest is going on list soon.

Paolo