On Fri, Mar 15, 2013 at 6:26 PM, Nick White <[log in to unmask]> wrote:

On Fri, Mar 15, 2013 at 09:37:41AM +0100, Notis Toufexis • Νότης Τουφεξής wrote:
> This is good news. I am just wondering, if this too technical for some users.

I wouldn't say Tesseract was too technical for many users, really.
There are several different GUIs which wrap around the engine and
make it easy to use, there is a list at:
http://code.google.com/p/tesseract-ocr/wiki/3rdParty

Admittedly it isn't as geared to desktop users as something like
ABBYY, but it shouldn't be too much work to figure out.

> I remember hearing Greg Crane talking about OCR in an event in London, it was
> about OCR with commercial products, stripping accents and putting them back
> again with the use of scripts -- I might have some notes somewhere.

Sounds like an interesting (if rather unpleasant ;)) idea. I suspect
just providing a good training file, and tweaking OCR engine
parameters to ensure things like good line segmentation would be
preferable, though.

Nick White