Our HLF-sponsored cataloguing project, "Shropshire's Past Unfolded", is
pursuing various methods of importing existing catalogue text into our CALM
2000 PLUS database.
We have some substantial catalogues of quite good modern typescript (and
some of only middling quality!) which we wish to scan into text files using
Optical Character Recognition, then mark up with EAD-type record and field
tags for importation into CALM. We imagine that where the fields contain
long passages of text, e.g. the dreaded calendar entries of deeds, this
would be very much faster than re-keying.
Our early experiments with scanning were encouraging to the extent that the
speed of scanning was only 35 seconds per A4 page. Initially we have used
standard office OCR software, several years old, i.e. TEXTBRIDGE and
OMNIPAGE, and we found a disappointingly high error rate. Many of these
errors were however repetitive - e.g. it couldn't see the 'g's in one font
- and we are encouraged to think that really up-to-date software could do a
lot better.
Can anyone recommend a suitable OCR package on the basis of their
experience? Ideally one that can be customised to deal with particular
types of page layouts and fonts, and taught to recognise odd or difficult
characters?
If you prefer, please respond off-list to: [log in to unmask]
Many thanks in anticipation.
David Jones
*************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the addressee. This
communication may contain material protected by law from
being passed on. If you are not the intended recipient and
have received this email in error, you are advised that any
use, dissemination, forwarding, printing, or copying of this
email is strictly prohibited. If you have received this
email in error, please notify the IT Technical Services
Manager at Shropshire County Council, telephone 01743 252131
[log in to unmask]
http://www.shropshire-cc.gov.uk
"Shropshire County Council - at the heart of our community"
*************************************************************
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|