I think Falk replied to Leslie somewhat at cross purposes.
PDFs are certainly a bad choice for data which should be made available
in raw formats and/or appropriate interchange formats. However, that says
nothing against making the PDFs of articles available! We should do both.
I would argue that, at present, PDF is the most openly accessible format
for textual documents where layout/presentation is at least somewhat
important. Good/free PDF viewers are available and easy to install so by
making a PDF openly available you make a document available to the entire
internet-connected research community. (For work with alternative formats
I think the US NLM/NCBI work with source documents in XML according to NLM
DTD and rendering into XHTML or PDF on demand is an exemplar. However, as
there aren't good authoring tools, this isn't an option for
self-archiving.)
An amusing note on data mining: At arXiv.org we have collected TeX source
of research papers for many years. In recent years, a number of projects
have done work on automatic citation extraction (including Citebase) and
other data mining on arXiv documents -- most have opted to work from the
processed PDF of papers (essentially "page scraping") rather than from the
raw TeX source. The reason is that in some sense there is more uniformity
in the PDF output (which tends to adhere to our presentation norms) than
in the TeX source (which supports a great many alternative ways of doing
things an, in general, requires a full TeX engine to understand).
Cheers,
Simeon
On Sun, 10 Dec 2006, Falk Huettmann wrote:
> Dear Leslie et al,
>
> sure, I agree in concept, but not in reality.
>
> PDFs are used to lock data away, to make them unusable.
> Many data sets, e.g. text files, exist already in good digital form, and
> become unsuable once presented
> as PDFs. So it's truly a step backwards.
>
> PDFs just support the concept of FEAR ("uh, somebody dares to use my
> information .").
> Whereas we all know and support: "The value of data lies in its use"!
>
> So we could, and should, use the raw data and text Files instead.
>
> In addition, we then get these clumsy PDFs that crash all the time.
> PDFs are NOT an option, and should not be used any further.
>
> They just support the notion of 'change for no change' (yeah, it's digital
> and online, but.).
>
> Creating PDFs costs money, too, and the funds should be invested more wisely
> instead.
>
> I am a user of public online data for over 10 years, and we have that
> problem frequently.
> We even re-digitized major PDF documents with raw data tables, into useable
> datasets and put them online.
> And there is software out there that does exactly that.
> Isn't that somewhat silly? Are we re-inventing the wheel here ?
>
> Anyways, let's see where we go with it.
>
> Kind regards
>
> F.
>
> Falk Huettmann PhD, Assistant Professor
> -EWHALE lab- Biology and Wildlife Dept., Institute of Arctic Biology
> 419 IRVING I, University of Alaska Fairbanks AK 99775-7000 USA
> Email [log in to unmask] Phone 907 474 7882 Fax 907 474 6716
>
>
> _____
>
> From: Leslie Carr [mailto:[log in to unmask]]
> Sent: Sunday, December 10, 2006 12:29 PM
> To: Falk Huettmann
> Cc: [log in to unmask]
> Subject: Re: PLoS business models, global village
>
>
> On 10 Dec 2006, at 08:27, Falk Huettmann wrote:
>
>
>
> Am I correct to say that PDFs are not part of true OpenAccess (raw data,
> shared analysis) and should be fully abandoned/replaced ASAP ?
> "True Open Access" is a hitherto unidentified specialisation of "Open
> Access". The latter simply requires research outputs to be accessible to
> everyone, without let or hindrance, now or in the future.
>
> Perhaps you are suggesting that PDFs are not an optimal information exchange
> vehicle - and many people (data miners) would agree with you. However, PDF
> files are the majority means of dissemination, and while we await the Next
> Great interoperability format (presumably based on XML) together with the
> easy-to-use tools to go with it, we should continue making PDFs open access
> with all our energy.
> --
> Les Carr
>
>
|