I can't speak for Paolo, but for me, the advantage of once-off and then
hard-coded lemmatizing is that I can in my publication workflow (all
figures below completely invented--even if they're out by an order of
magnitude, my point stands):
1) dynamically lemmatize 100,000 tokens (let's say 40,000 unique forms,
10,000 desired lemmata) using a morphology service;
2) receive back 9,750 lemmata (leaving a bunch of obscure forms to be
hand-identified); 20% of which are ambiguous (needing to be
hand-disambiguated); a further 5% of which are just wrong (and I just
hope I spot them);
3) fix all the problems in (2) above, hard-coding that in my texts, and
generating indices, search results etc. from the corrected results.
This certainly won't be appropriate for every corpus or digital edition,
but for several epigraphic corpora that I've worked on, it has been.
Other editions may prefer the less labor-intensive, more flexible,
extensible and updateable practice of using an (internal or external)
dynamic parsing service, giving less accurate and sometimes ambiguous
results, but obviously presented with a caveat usor.
Best,
Gabby
On 14/01/2016 12:16, Roberto Rosselli Del Turco wrote:
> Il 12/01/2016 15:14, Gabriel BODARD ha scritto:
>> I'd be interested to hear what you end up doing, Paolo (and for that
>> matter--what some of the other possibilities on offer are...).
>
> I'd be interested in that as well, thanks for setting up the wiki page
> guys. Also Paolo wrote
>
> >>> In fact, rather than an on-the-fly parser available to the end user,
> >>> what I want is to perform lemmatization/morphological analysis once
> >>> through the text and hard-code the results in my XML/TEI.
>
> Is there any particular reason why one would prefer to hard-code the
> results in the TEI document? At first glance, that doesn't sound so
> appealing when compared to an on-the-fly parser, but maybe I'm missing
> something here.
>
> All best,
>
> R
>
--
Dr Gabriel BODARD
Reader in Digital Classics
Institute of Classical Studies
University of London
Senate House
Malet Street
London WC1E 7HU
E: [log in to unmask]
T: +44 (0)20 78628752
http://digitalclassicist.org/
|