medieval-religion: Scholarly discussions of medieval religion and culture
From: Chris Laning <[log in to unmask]>
> 3) create or re-create a file in Word, PageMaker, or a number of other
programs and export it directly as a PDF file. If there's text in your
original, that text will be searchable in the resulting PDF file.
> The point being that you can only search text in a PDF if the text is in the
file _as text_ in the first place.
gosh, i wish i'd said that.
>Most Acrobat PDFs you find on the web are, in fact, searchable, since they
were usually created by option (3).
"most" here doesn't apply to the masses of .pdfs on the Gallica site; nor
those on the Ecole des Chartes site; nor innumerable ones on gubbermint sites
which are just scans of various documents.
only .pdfs which were originally created in some sort of wordprocessor fashion
will be legible, unless someone has done the work of OCRing the scans of the
pages, thereby creating a text file.
>Acrobat files from other sources may or may not be searchable, depending on
how they were created.
yep.
except for the *fact* that you can go to the http://gallica.bnf.fr site and do
a "recherche" on a bit of text and you'll get hits of files where that bit of
text appears.
now, my question is, *how* did they do that?
the .pdf files are definitely *not* searchable, they're just graphics.
and yet the database of millions of pages there is searchable.
'splain that to me, will you, Chris?
c
**********************************************************************
To join the list, send the message: join medieval-religion YOUR NAME
to: [log in to unmask]
To send a message to the list, address it to:
[log in to unmask]
To leave the list, send the message: leave medieval-religion
to: [log in to unmask]
In order to report problems or to contact the list's owners, write to:
[log in to unmask]
For further information, visit our web site:
http://www.jiscmail.ac.uk/lists/medieval-religion.html
|