Clearly, improving the underlying scanned text would improve information
retrieval performance, so one would hope that enlightened self-interest
might suggest applying this course of action to the source data.
Richard
In message
<[log in to unmask]
.uk>, "Ottevanger, Jeremy" <[log in to unmask]> writes
>Ditto, though we just have to see what the quality is like. Extracting
>text from scans via OCR may produce output that's useful for feeding
>into a search engine but pretty much unreadable to humans. Even that
>should be freed up, I agree, but we can't assume that just because the
>"digitised text" exists then there are millions of articles in raw text
>ready to be reused as-is.
>
>Tidying them up sounds like a job for crowd-sourcing. Richard mentions
>Project Gutenberg, and doubtless Frankie will have other tips on
>crowd-sourcing. A big job which ever way you look at it!
>
>Cheers, Jeremy
>
>
>
>Jeremy Ottevanger
>Web Developer, Museum Systems Team
>Museum of London
>46 Eagle Wharf Road
>London. N1 7ED
>Tel: 020 7410 2207
>Fax: 020 7600 1058
>Email: [log in to unmask]
>www.museumoflondon.org.uk
>
>Spectacular new ?20 million Galleries of Modern London opening at
>Museum of London in spring 2010.
>
>Find out more at www.museumoflondon.org.uk
>
>Before printing, please think about the environment
>
>
>
>-----Original Message-----
>From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of
>Andy Powell
>Sent: 18 June 2009 12:27
>To: [log in to unmask]
>Subject: Re: [MCG] BL Newspapers and open content
>
>Well, since you asked... :-)
>
>I very strongly agree with Richard that opening up the underlying
>content/data should always be seen as a high priority and have made
>similar points to the BL previously, e.g.
>
>http://efoundations.typepad.com/efoundations/2008/03/hiding-magna-ca.htm
>l
>
>Andy
>
>________________________________
>
>Andy Powell
>Research Programme Director
>Eduserv
>
>[log in to unmask]
>01225 474319 / 07989 476710
>www.eduserv.org.uk
>efoundations.typepad.com
>twitter.com/andypowe11
>-----Original Message-----
>From: Museums Computer Group [mailto:[log in to unmask]] On Behalf Of
>Alastair Dunning
>Sent: 18 June 2009 12:22
>To: [log in to unmask]
>Subject: BL Newspapers and open content
>
>...
>
>If this is a general feeling amongst the MCG that this open data is a
>key part of making such content accessible, I'm happy to take these
>comments back to the BL's project board for newspapers. And as paying
>customers (another interesting issue) it's the kind of thing you might
>want to let the BL know about directly.
>
>...
>
>****************************************************************
>For mcg information visit the mcg website at
>http://www.museumscomputergroup.org.uk.
>To manage your subscription to this email list visit
>http://www.museumscomputergroup.org.uk/email.shtml
>****************************************************************
>
>****************************************************************
>For mcg information visit the mcg website at
>http://www.museumscomputergroup.org.uk.
>To manage your subscription to this email list visit
>http://www.museumscomputergroup.org.uk/email.shtml
>****************************************************************
>No virus found in this incoming message.
>Checked by AVG - www.avg.com
>Version: 8.5.339 / Virus Database: 270.12.77/2184 - Release Date:
>06/17/09 17:55:00
--
Richard Light
****************************************************************
For mcg information visit the mcg website at
http://www.museumscomputergroup.org.uk.
To manage your subscription to this email list visit
http://www.museumscomputergroup.org.uk/email.shtml
****************************************************************
|