Hi,
In case they are useful people may also want to look into:
Scanning:
*Decapod* is a project focused on building a low-cost digitization
solution that will allow for rare materials, materials held in
collections without large budgets, and other scholarly content to be
digitized into a high-quality PDF format. This project will work to
incorporate the off-the-shelf hardware and open source software
necessary to accomplish this goal.
http://wiki.fluidproject.org/display/fluid/Decapod
Further Processing:
I do not know it very well but there is a Wikipedia sister project
related to transcribing and translating older documents called *Wikisource*:
http://en.wikisource.org/wiki/Main_Page (Help page at
http://en.wikisource.org/wiki/Help:Contents)
Documents can be uploaded and the automatic OCR can be checked/validated
and even translated by volunteers.
Joe
On 11/08/11 11:40, Ed I Bremner wrote:
> Hi Trevor,
>
> sorry to take a day to get back to you on this one...
>
> The collaborative correction system mentioned is called CONCERT and has been
> developed within the IMPACT Project by IBM. The availability and licensing
> has been quite a hot issue, so I just wanted to check with the IMPACT
> project office for the latest news and they sent me the following:
>
>
> "Thank you for your interest in our project and the CONCERT tool.
>
> Software tools developed by IMPACT partners will be owned by the relevant
> party who developed it. Some of the tools will be available freely for
> research, however for commercial use - specific terms and conditions will
> apply that are likely to be subject to negotiation (according to the
> intended type of use, the number of users and the content to be accessed,
> the functionality required etc.).
>
> The CONCERT tool is developed by IBM so will fall under their commercial
> licensing specifications.
>
> However, the IMPACT Centre of Competence that will be launched at the
> conference in October, which will then act as the central point for getting
> access to software, services and support from the IMPACT project.
>
> In the meantime, you can also find more information on our blog
> (http://impactocr.wordpress.com/), which covers IMPACT training events and
> features live demonstrations of some of the initial IMPACT tools. In this
> recent presentation you can find more information about the CONCERT tool
> specifically
>
> http://impactocr.wordpress.com/2011/07/12/the-ocr-process-clemens-neudecker/
>
>
> Please feel free to get back to me with any further questions."
>
>
> I have seen it in operation and it certainly seems to work very well.
>
> Best wishes
>
> eib
>
> Ed Bremner - IMPACT Project
> UKOLN
> [log in to unmask]
> SKYPE: ed.bremner
>
> ******************************
> Ed I Bremner
> Consultant and Trainer in Digital Media
> BremWeb Imaging
> www.bremweb.co.uk
> [log in to unmask]
> 07973 335509
> ******************************
>
>
> -----Original Message-----
> From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of
> REYNOLDS, Trevor
> Sent: 10 August 2011 12:00
> To: [log in to unmask]
> Subject: Re: Software for digitising magazines& IMPACT Conference
>
> Thanks Ed. That looks interesting, I might come along if I can find some
> spare holiday.
>
> I see that the IMPACT web site mentions a "A full web-based collaborative
> correction system: this web-based platform, suitable for massive volunteer
> participation, validates and corrects OCR results. In this way, it enables
> the general public to help with large scale digitisation efforts" as one of
> the tools it is developing. Do you know when this is likely to be
> available? (and where from?).
>
> Trevor Reynolds
> Collections Registrar, English Heritage
> tel: +44 (0) 1904 601905. 37 Tanner Row, York, YO1 6WP
>
> -----Original Message-----
> From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of Ed I
> Bremner
> Sent: 10 August 2011 10:24
> To: REYNOLDS, Trevor
> Subject: Re: Software for digitising magazines& IMPACT Conference
>
> Dear All,
>
> MCG members interested in the cutting edge of OCR and the digitisation of
> historic text (including magazines), may well want to consider coming to the
> IMPACT Conference at the British Library on the 24-25th of October 2011.
>
> This event will showcase the results from the IMPACT project and launch the
> IMPACT Centre of Competence.
>
> IMPACT is a European project that has been developing new tools to improve
> the mass digitisation and OCR of historic text -
> See: http://www.impact-project.eu/
>
> Details of the conference are below, with a full programme at:
> http://www.impact-project.eu/news/ic2011/conference-programme/
>
>
>
> *********************************************************
>
> With this email we would like to invite you to the final conference of the
> IMPACT project, "Digitisation& OCR: Better, faster, cheaper. Solutions of
> the IMPACT Centre of Competence and future challenges" that will take place
> on 24-25 October 2011 at the British Library in London. At this conference
> IMPACT will present the final project results, along with related research
> in the field of OCR and language technology.
>
> This event will also mark the official launch of the IMPACT Centre of
> Competence. This Centre is focused on making digitisation of historical
> printed text in Europe better, faster, cheaper by sharing expertise and
> providing access to tools for all parts of the digitisation workflow, as
> well as tools, services and facilities for further advancement of the State
> of the Art in this field.
>
> The programme for the conference is now online on the conference webpage,
> highlights include:
>
> . Khalil Rouhana (European Commission - Director for digital content
> and cognitive systems in DG Information Society and Media): "The EC Digital
> Agenda and official launch of the IMPACT Centre of Competence"
> . Michael Fuchs (ABBYY Europe): "ABBYY FineReader: IMPACT
> improvements"
> . Paul Fogel (California Digital Library): "Experiences in mass
> digitisation: examining OCR quality"
> . Clemens Neudecker (National library of the Netherlands): "The IMPACT
> Framework and what you can do with it"
> . Asaf Tzadok (IBM Haifa Research Lab): "IBM Adaptive OCR engine and
> CONCERT Cooperative Correction"
> . Majlis Bremer-Laamanen (National Library of Finland): "Crowdsourcing
> for OCR correction: Experiences with Digitalkoot"
> . Katrien Depuydt (INL ) and Klaus Schulz (University of Munich):
> "Language work in IMPACT"
> . Stephen Krauwer (CLARIN coordinator, University of Utrecht):
> "Related language work in CLARIN"
> . Parallel sessions on State of the art research tools for document
> analysis and OCR, IMPACT language tools& resources and Digitisation tips
> (Meet the expert).
>
> More programme updates will be announced through
> http://www.impact-project.eu/news/ic2011/conference-programme/ and Twitter
> (hashtag: #impactconf2011). Registration is now possible at the regular fee
> of 120 GBP. To register, please go to this BL ticket website and click
> October. More information is also available from the attached flyer.
>
>
> *********************************************************************
>
> Best Wishes
>
> Ed Bremner - IMPACT Project
> UKOLN
> [log in to unmask]
> SKYPE: ed.bremner
>
> ******************************
> Ed I Bremner
> Consultant and Trainer in Digital Media
> BremWeb Imaging
> www.bremweb.co.uk
> [log in to unmask]
> 07973 335509
> ******************************
>
>
> -----Original Message-----
> From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of Adam
> Waterton
> Sent: 10 August 2011 09:41
> To: [log in to unmask]
> Subject: Re: Software for digitising magazines
>
> Hi Trevor,
> We recently undertook a project to digitise and create machine readable
> versions of a series of Royal Academy of Arts exhibition catalogues
> (1870-1913). We tried a few OCR packages and also found that Abbyy
> Finereader http://finereader.abbyy.com/ gave good results. However, the
> resulting text files were still very inaccurate and required an enormous
> amount of manual tidying up to make them accurate enough for consistent
> searching. Also, Abbyy is not cheap and the costs will mount up if you need
> a separate Abbyy licence for each of your volunteers.
>
> The results of our digitisation project can be seen here:
> http://www.racollection.org.uk/ixbin/indexplus?_IXACTION_=file&_IXFILE_=temp
> lates/pages/exhibition_list.html
>
> Regards,
> Adam.
>
> Adam Waterton
> Head of Library Services
> Royal Academy of Arts
> Burlington House
> Piccadilly
> London
> W1V 0DS
>
> T: 020 7300 5740 | F: 020 7300 5765 | E: [log in to unmask]
>
> The Royal Academy of Arts Collection Online: www.racollection.org.uk
>
> -----Original Message-----
> From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of
> Howell, Alan
> Sent: 09 August 2011 10:09
> To: [log in to unmask]
> Subject: Re: Software for digitising magazines
>
> Hi Trevor
>
> I have used Abbey Finereader for some projects at home and found it to be
> very effective at this sort of thing.
>
> Kind regards
>
> Alan Howell
> Guernsey Museums& Galleries
> SSDDI +44 (0) 1481 709736
>
>
> -----Original Message-----
> From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of
> REYNOLDS, Trevor
> Sent: 06 August 2011 09:49
> To: [log in to unmask]
> Subject: Software for digitising magazines
>
> Dear all
>
> A volunteer run charity I'm involved with wants to digitise the back issues
> of its periodicals.
>
> What they want to end up with is PDF/A format documents with a scanned image
> of each page with searchable text underneath the image. Many of the early
> issues have poor quality text and any OCRed text will probably need heavy
> editing.
>
> Can you recommend software which will enable this to be done? They are
> intending to split the work between a number of volunteers who will be
> working at home on their own computers so low cost, easy to use solutions
> would be welcome!
>
> Trevor Reynolds
> Collections Registrar, English Heritage
> 37 Tanner Row, York, YO1 6WP tel: 01904 601905
>
> Portico: your gateway to information on sites in the National Heritage
> Collection; have a look and tell us what you think.
> http://www.english-heritage.org.uk/professional/archives-and-collections/por
> tico/
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
> This e-mail (including attachments) may contain sensitive and/or privileged
> information. If received in error, its use by you is not authorised and may
> be unlawful. Please notify the sender and delete all copies immediately.
> E-mails may be subject to error, interference and virus and no liability is
> accepted for loss or damage however it arises and whether direct or
> indirect. Service of legal proceedings by e-mail may not be accepted.
>
> E-mails may be monitored for compliance purposes. All documents are subject
> to copyright.
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
> The Royal Academy of Arts is a registered charity under Registered Charity
> Number 1125383 and is also registered as a company limited by guarantee in
> England and Wales under Company Number 6298947. Registered office:
> Burlington House, Piccadilly, London, W1J 0BD.
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
>
> Portico: your gateway to information on sites in the National Heritage
> Collection; have a look and tell us what you think.
> http://www.english-heritage.org.uk/professional/archives-and-collections/por
> tico/
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
>
--
*Joseph Padfield*
Conservation Scientist
Scientific Department
The National Gallery
Trafalgar Square
London WC2N 5DN
44 (0)20 7747 2553
Email <mailto:[log in to unmask]> Follow JoePadfield on
Twitter <http://www.twitter.com/JoePadfield> Link 1
<http://cima.ng-london.org.uk> Link 2 <http://research.ng-london.org.uk>
----------------------------------------------------------------
Devotion by Design: Italian Altarpieces before 1500
6 July - 2 October 2011, Admission free
Find out more:
http://www.nationalgallery.org.uk/devotion-by-design
Leonardo da Vinci: Painter at the Court of Milan
9 November 2011 - 5 February 2012
Book now:
http://www.nationalgallery.org.uk/leonardo-exhibition
Sign up for news, offers and exclusive competitions from the
National Gallery:
http://www.nationalgallery.org.uk/what/news/subscribe.htm
****************************************************************
website: http://museumscomputergroup.org.uk/
Twitter: http://www.twitter.com/ukmcg
Facebook: http://www.facebook.com/museumscomputergroup
[un]subscribe: http://museumscomputergroup.org.uk/email-list/
****************************************************************
|