There seems to be an unwritten assumption in the discussion that the format
of the submission and dissemination should be the same. Although the
requirement of PDF as a deposit format will reduce the effort necessary to
prepare it for distribution, there are likely to be long-term problems in
converting the content to some other format. Although I share the common
dislike of Microsoft, the tools to convert MS Word to Open Document produce
more reliable results in comparison to PDF export. Instead, as Steve
suggests, a distinction should be made between source format (the format
that the author used to create their research data); the archival format
intended for preservation; and a format intended for use by distribution to
the wider community.
I agree with Steve Hitchcock's earlier e-mail encouraging the submission of
the source format for preservation purposes. In addition to the preservation
perspective, there may be a convincing argument that it would simplify the
submission process and *potentially* encourage more authors to deposit their
work. PDF is often listed as the only format accepted by an IR, or as a
minimum the preferred format. In order to comply with these requirements,
the author must locate a suitable conversion tool to change their MS Word
document into a PDF prior to giving it to the repository - a process that
requires time and technical knowledge that the author may not have. This
hurdle will be removed if you ask an author to submit their research data in
the file format in which it was created.
Gareth
--
Gareth Knight
Digital Preservation Officer
Arts and Humanities Data Service
email: [log in to unmask]
phone: 0207 848 1979
http://www.sherpadp.org.uk/
-----Original Message-----
From: Repositories discussion list
[mailto:[log in to unmask]]On Behalf Of Stephen Downes
Sent: 12 December 2006 14:37
To: [log in to unmask]
Subject: Re: PLoS business models, global village
Hiya,
>I would say that the primary objective is to get it into the repository in
*some* format, under a license that would allow it to be converted to other
>formats as needed. This provides the greatest ease for the author, and
allows the greatest flexibility for the reader, as well as ensuring the
greatest >possible change of being able to access the content in the long
term.
>Yes, PDF is a miserable format, not as bad as MS-Word, but miserable,
especially for those of us who read documents online. But a PDF is a whole
lot better >than nothing, and a PDF that can (eventually) be converted to
much a more friendly format is even better.
-- Stephen
Peter Crowther wrote:
From: Steve Hitchcock
The critical point for repositories is to obtain the *source*
copy of the deposited item, exactly as the author created it.
I'm not entirely sure I agree. File formats change - rapidly! - and
some of the more obscure tools are not commonly installed. For a simple
example, consider a Microsoft Word document with embedded Microsoft
Visio diagrams. There are comparatively few machines on which that
document could be viewed intact, and there is comparatively little
chance that repository software could satisfactorily transform the
source document.
For a more complex example, and admittedly not one to do with OA, check
out the preservation of the Domesday Project, where the source form of
the data is now almost unreadable.
- Peter
--
Stephen Downes ~ Research Officer ~ National Research Council Canada
http://www.downes.ca ~ [log in to unmask] __\|/__ Free Learning
--
|