Print

Print


There is a "best of both worlds" alternative that the community might want
to consider.

This is a combination of an OpenOffice Writer document (which conforms to
the new Open Document Format standard¯see
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office) and a
PDF, packaged as a zip file both for efficiency and also to provide firm
binding of the two renderings.

Best wishes, Henry
 
H.M. Gladney, Ph.D.   http://home.pacbell.net/hgladney   

-----Original Message-----
From: Repositories discussion list [mailto:[log in to unmask]]
On Behalf Of Simeon Warner
Sent: Sunday, December 10, 2006 6:48 AM
To: [log in to unmask]
Subject: Re: PDFs lock data away RE: PLoS business models, global village

I think Falk replied to Leslie somewhat at cross purposes.

PDFs are certainly a bad choice for data which should be made available in
raw formats and/or appropriate interchange formats. However, that says
nothing against making the PDFs of articles available! We should do both.

I would argue that, at present, PDF is the most openly accessible format for
textual documents where layout/presentation is at least somewhat important.
Good/free PDF viewers are available and easy to install so by making a PDF
openly available you make a document available to the entire
internet-connected research community. (For work with alternative formats I
think the US NLM/NCBI work with source documents in XML according to NLM DTD
and rendering into XHTML or PDF on demand is an exemplar. However, as there
aren't good authoring tools, this isn't an option for
self-archiving.)

An amusing note on data mining: At arXiv.org we have collected TeX source of
research papers for many years. In recent years, a number of projects have
done work on automatic citation extraction (including Citebase) and other
data mining on arXiv documents -- most have opted to work from the processed
PDF of papers (essentially "page scraping") rather than from the raw TeX
source. The reason is that in some sense there is more uniformity in the PDF
output (which tends to adhere to our presentation norms) than in the TeX
source (which supports a great many alternative ways of doing things an, in
general, requires a full TeX engine to understand).

Cheers,
Simeon


On Sun, 10 Dec 2006, Falk Huettmann wrote:
> Dear Leslie et al,
>
> sure, I agree in concept, but not in reality.
>
> PDFs are used to lock data away, to make them unusable.
> Many data sets, e.g. text files, exist already in good digital form, and
> become unsuable once presented
> as PDFs. So it's truly a step backwards.
>
> PDFs just support the concept of FEAR ("uh, somebody dares to use my
> information .").
> Whereas we all know and support: "The value of data lies in its use"!
>
> So we could, and should, use the raw data and text Files instead.
>
> In addition, we then get these clumsy PDFs that crash all the time.
> PDFs are NOT an option, and should  not be used any further.
>
> They just support the notion of 'change for no change' (yeah, it's digital
> and online, but.).
>
> Creating PDFs costs money, too, and the funds should be invested more
wisely
> instead.
>
> I am a user of public online data for over 10 years, and we have that
> problem frequently.
> We even re-digitized major PDF documents with raw data tables, into
useable
> datasets and put them online.
> And there is software out there that does exactly that.
> Isn't that somewhat silly? Are we re-inventing the wheel here ?
>
> Anyways, let's see where we go with it.
>
> Kind regards
>
>     F.
>
> Falk Huettmann PhD, Assistant Professor
> -EWHALE lab- Biology and Wildlife Dept., Institute of Arctic Biology
> 419 IRVING I, University of Alaska Fairbanks AK 99775-7000 USA
> Email [log in to unmask]  Phone 907 474 7882 Fax 907 474 6716
>
>
>  _____
>
> From: Leslie Carr [mailto:[log in to unmask]]
> Sent: Sunday, December 10, 2006 12:29 PM
> To: Falk Huettmann
> Cc: [log in to unmask]
> Subject: Re: PLoS business models, global village
>
>
> On 10 Dec 2006, at 08:27, Falk Huettmann wrote:
>
>
>
> Am I correct to say that PDFs are not part of true OpenAccess (raw data,
> shared analysis) and should be fully abandoned/replaced ASAP ?
> "True Open Access" is a hitherto unidentified specialisation of "Open
> Access". The latter simply requires research outputs to be accessible to
> everyone, without let or hindrance, now or in the future.
>
> Perhaps you are suggesting that PDFs are not an optimal information
exchange
> vehicle - and many people (data miners) would agree with you. However, PDF
> files are the majority means of dissemination, and while we await the Next
> Great interoperability format (presumably based on XML) together with the
> easy-to-use tools to go with it, we should continue making PDFs open
access
> with all our energy.
> --
> Les Carr
>
>