Dear Nick (and all),
Just in case you haven't seen this already, Federico Boschetti has also made available on his website <http://www.himeros.eu/> the training sets he used for Tesseract (see section "Ancient Greek OCR Trainings"). I think it's great that a body of distributed, openly available training datasets for OCR engines starts to emerge.
If I have the chance next week I will try to add this perhaps together with the links and publications provided by Lisa (thanks!) to the wiki page.
Cheers,
Matteo
On Fri, Mar 15, 2013 at 09:37:41AM +0100, Notis Toufexis • Νότης Τουφεξής wrote:I wouldn't say Tesseract was too technical for many users, really.
> This is good news. I am just wondering, if this too technical for some users.
There are several different GUIs which wrap around the engine and
make it easy to use, there is a list at:
http://code.google.com/p/tesseract-ocr/wiki/3rdParty
Admittedly it isn't as geared to desktop users as something like
ABBYY, but it shouldn't be too much work to figure out.
Sounds like an interesting (if rather unpleasant ;)) idea. I suspect
> I remember hearing Greg Crane talking about OCR in an event in London, it was
> about OCR with commercial products, stripping accents and putting them back
> again with the use of scripts -- I might have some notes somewhere.
just providing a good training file, and tweaking OCR engine
parameters to ensure things like good line segmentation would be
preferable, though.
Nick White