Dear Notis,
A quick response, apologies for only a couple of hasty citations.
In the original post by Gabby Bodard there is a link to Bruce Robertson's work here:
http://www.heml.org/RobertsonGreekOCR/
Bruce along with our collaborator from the CNR, Federico Boschetti, have been working on the Greek OCR issue for a number of years as part of the Dynamic Variorum Editions project. Greg was referring to this work.
A couple of relevant publications Perseus has on-line:
What Did We Do With A Million Books: Rediscovering the Greco-Ancient world and reinventing the Humanities. In White Paper Submitted to the NEH, National Endowment for the Humanities, 2011.
http://hdl.handle.net/10427/75558
Improving OCR Accuracy for Classical Critical Editions. In Proceedings of the 13th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2009), pages 156-167, Corfu Greece : Springer Verlag, 2009-09.
http://dl.tufts.edu/catalog/tufts:PB.001.011.00001
All the best,
Lisa
On 3/15/13 4:37 AM, Notis Toufexis • Νότης Τουφεξής wrote:
This is good news. I am just wondering, if this too technical for some users.
I have done some experimenting in the past with ABBYY FIne Reader, after some training the results were acceptable. There is also Anagnostis, an OCR-Solution produced in Greece, which is rather expensive and, if I am to judge from the demo, not very accurate.
I remember hearing Greg Crane talking about OCR in an event in London, it was about OCR with commercial products, stripping accents and putting them back again with the use of scripts -- I might have some notes somewhere.
All the best,
Notis
On 13 March 2013 18:41, Nick White <[log in to unmask]<mailto:[log in to unmask]>> wrote:
I've done quite a bit of work recently on getting the Tesseract OCR
engine to cope well with Ancient Greek, as part of the the ERC
project Living Poets [1]. The 'training' file
resulting from that is now available from their website. I wrote an
article on how I went about it and some of the issues involved,
which is available at [2], with associated code and whathaveyou at
[3].
This should certainly be pointed to on the Digital Classicist wiki
page. I'm happy to add something, but won't get a chance to until
next week - if anyone else wants to do it for me go right ahead!
As to the question of when OCR is appropriate as opposed to hand
keying, I'd say that the quality of OCR output is now good enough
that in general OCRing and then correcting the result is going to be
the best option.
There is certainly scope for more tools to make such hand correction
faster and easier, for example configuring Tesseract to highlight
words / characters it is least certain about, but that would require
a little programming.
Hope this is useful,
Nick White
1. http://www.dur.ac.uk/classics/livingpoetsproject/ - there will be
a proper website very soon!
2. http://www.eutypon.gr/eutypon/pdf/e2012-29/e29-a01.pdf
3. http://www.dur.ac.uk/nick.white/grctraining/
--
Dr. Notis Toufexis • Nότης Τουφεξής
http://www.toufexis.info
http://www.toufexis.gr
http://www.early-modern-greek.org
--
Lisa M. Cerrato
Managing Editor
Perseus Digital Library
|