Print

Print


Hello Wendy, long time no speak.

I've worked with OCR software many times over the last few years - it's fair
to say that it can give mixed results.

OCR cannot effectively read hand-writing - yes there have been projects to
read handwritten records, but these have been typically based around
standard response, handwriting in capitals or forms, where ticks in boxes
can be recognised.

OCR will sometimes read records perfectly, sometimes, simply depending on
the typeface it will produce complete rubbish. The other problem you can
get, especially with older type-faces is that the software can read l as 1,
m as ni or S as 5, etc. The software will also often sprinkle the document
with capitals, reading an s as an S, etc. Whilst a spell-checker will sort
some of these out, sometimes the change can go unnoticed as a it creates a
real word - also if you are using scientific terms or latin names, a
spell-checker will be useless anyway. Finally, OCR software can give
variable results with columns.

There are specialist providers out there, but this can be expensive.

Don't think I am being negative - more ....realistic. OCR works best where
the input documents are simple, predictable and consistent. Where there is
variation or anything unusual, it starts to run into problems. Even the best
OCR software will only ever be 95%-99% accurate - this sounds good, but it
could mean one in every 20 characters is wrong. The end result will be a lot
more manual input to check and correct the digitised data - this to a degree
defeats the whole object of OCR'ing in the first pace.

Sometimes it is cheaper and easier to hire in a professional typist to bash
the details in. The rate of throughput, accuracy
and cost can often give OCR a serious run for its money.

Regards

Chris Meaney (AIMC)
Managing Director

 ========================================================================
Harvard Consultancy Services Ltd, Bexin House, 2/3 St. Andrews Place
Southover Road, Lewes, East Sussex, BN7 1UP
Tel: 01273 897517, Fax: 01273 471929, E-Mail: [log in to unmask]

Registered in England & Wales no. 3766540
Registered Office: 50 Harvard Close, Malling, Lewes, East Sussex, BN7 2EJ.


-----Original Message-----
From: Museums Computer Group [mailto:[log in to unmask]]On Behalf Of Ian
Morrison
Sent: 20 February 2002 09:00
To: [log in to unmask]
Subject: Re: Advice on OCR software please


On Wed, 20 Feb 2002, Wendy Sudbury wrote:

> Could colleagues share their experience of successful use of OCR
> software?  We have A5 index cards, typed in assorted fonts,  4 or 5
> columns across, with a few handwritten markings (numbering and ticks in
> pencil or biro).  I'm looking to migrate the information to a table in
> Word whenever we have to update a card.  We update modest quantities,
> maybe 40 a month.  It seems to make sense to scan the old card first.

I have done some experiments with various kinds of museum records over
the years, latterly using an evaluation version of PageGenie 98 with some
success. I don't know if it is still available for free download but, if
so, it might be worth giving it a try. The success will probably depend
on the consistency of the original typing - electronic typewriters
generally give the best results. So far I haven't found any OCR software
that will make much sense of the average curator's handwriting, sadly,
nor do I know of any museum that has used OCR successfully to digitise
records in any numbers.

Hope this helps.

-------------------------------------------------------------
Ian O. Morrison, Scottish Museums Documentation Officer
http://ianmorrison.topcities.com/index.htm
Hostes alienigeni me abduxerunt