John Robert Gardner wrote:
>
> What sorts of databases are people using for their larger-scale
> digitization projects? Particularly, is anyone having to integrate keyed
> data from--say--journals, with page images, accessible through MARC or
> otehr Z39.50 library type info portal?
>
> If so, what tool(s) do you use?
>
> ________________________________________
The full text collection of the Historic Pittsburgh project
digital.library.pitt.edu/pittsburgh/
follows the model of Making of America at the University of Michigan. We are
SGML based and use the PAT 5.0 search engine of OpenText and have adapted CGI
scripts from MOA for our Web interface. Because we are digitizing 19th
century works held by us and our partner, we have extracted MARC records from
the respective databases and have a program, marc2sgml, that converts MARC
based data into their matching TEI header format to make these searchable
within our full text database. We retain the description of the original,
subject headings and other controlled access headings. The searchable text
which underlies our page image displays is derived from batch OCR text which
we edit minimally. We use PrimeRecognition OCR software and are currently
running 5 OCR engines. We also have a program which takes structural
metadata recorded in spreadsheets for each title to automatically encode the
text in SGML tags, roughly faithful to TEI conventions. The SGML encoding
allows us to offer more specific searching within the fulltext database.
We are in the process of outsourcing the cataloging of our electronic
editions so that we may add the records to OCLC as well as to our OPAC.
Because we have a bookmarkable URL for each title, users can get to specific
titles from the 856 field in our library's catalog if there is a MARC record
for the electronic editions.
Our Website gives more detail on the "how to" aspects of our project. The
$3000 annual membership fee for the SGML Server Program (SSP) at the
University of Michigan is a bargain. We avoided the lengthy development
process and were up and running with decent content in about 18 months. (Of
course, we were lucky to have on staff a person who had worked with MOA prior
to coming to Pitt so this time estimate would not be the same for other
institutions.)
--
Doris Hayashikawa, Coordinator
Digital Research Library, University Library System
University of Pittsburgh
G20R Hillman Library
Pittsburgh, PA 15260
Phone: (412) 648-7765
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|