Hello all,
From an archival perspective, Brian's approach of depositing an article in its original format and as a distribution copy seems a reasonable one. The MS Word may provide insight into the creation process (tools used, changes made), while the PDF copy provides the benefit of a consistent layout. Capturing both copies (as well as earlier versions) would seem to be useful to understand the content's provenance.
During my time on the SHERPA project in 2003/2004, I recall we consulted the partners to establish if they would be willing to publish articles in their originally created format and as a published PDF. At the time, concern was expressed that this may serve as a barrier for deposit and use. Dredging my memory, I seem to recall three factors being mentioned:
(1) A desire to establish a simple message for depositors (We will publish in PDF. You can provide content in that format or we'll convert it for you)
(2) A desire to minimise potential confusion for end users when they encounter similarly labelled items (which is the authentic copy that I should read?)
(3) A desire to avoid perceived preservation costs through maintaining a large number of formats in the repository (caused in part, by a lack of automated conversion tools)
I suspect that the risks posed by 1 & 2 have resolved themselves to some degree through user education. Researchers are increasingly familiar with the use of web-based publication systems for hosting their papers. Although there continues to be confusion with regards to file formats and versioning, these are being addressed through improved tools to label semantic relationships between different types of content. The 3rd issue remains a concern, but has not prevented the rise of research data repositories in recent years.
Returning to Brian's email, I notice that no one has replied to his question on policy documents that reflect practice. There's been a lot of work on evaluating preservation formats in a repository (building upon the Todd report in 2009; the PLANETS Decision Tree approach, and others). However, this does not seem to have filtered through to policy documents that support the IR. Now that we are providing granular deposit methods (deposit of files copied to a virtual drive, automated format conversion, etc.), I wonder if we should re-assess the deposit format that the repository is willing to accept. This would, in turn, simplify the submission process for the depositor.
Gareth
--
Gareth Knight
Research Data Management Project Manager
London School of Hygiene & Tropical Medicine (LSHTM)
Keppel Street, London, WC1E 7HT
Email: (+44) 020 7927 2564
>>> Panyarak Ngamsritragul <[log in to unmask]> 26/07/2012 09:44 >>>
One critical problem of Word Processor formats, either MS Word's docx or
OpenOffice or LibreOffice's open document formats, is most of them are not
truely backward compatible. I am sure you could face some trouble in
opening your MS Word documents created a few years ago. While this sort
of trouble is not found in PDF reader, if I am not wrong.
Though MS Word is now claiming that they (try to) support open document
formats, but this is still far from being perfect and it is still quite
doubtful whether MS is willing to comply with the open document formats.
It is sad to know that MS Word format could become a common format. If
you are familiar with TeX, you should know that you can still work with
the TeX files you created about more than 20 years ago...
Panyarak Ngamsritragul
Department of Mechanical Engineering
Prince of Songkla University.
On Thu, 26 Jul 2012, David Groenewegen wrote:
> I think the other "problem" with Word comes from the word processor wars of
> the early 90s, when it was unclear what would be the most common format
> (remember WordPerfect? WordStar? MacWrite?). It made people nervous about its
> longevity.
>
> But Word has been the default standard for creating documents for a longish
> time now - there can't be many people or companies who don't rely on it (and
> even if you don't I bet you still have the capacity to deal with it). If the
> ability to access the billions of Word documents out there disappeared
> tomorrow through some bizarre circumstance where every single one of the
> hundreds of millions of copies of Word
> (<http://blogs.technet.com/b/office2010/archive/2009/10/07/new-ways-to-try-and-buy-microsoft-office-2010.aspx>
> and all the various compatible tools (<https://docs.google.com/>) stopped
> working, someone would have to invent a way of overcoming this pretty quick
> smart.
>
> Please note: I'm not saying that Word is perfect, or that I'm thrilled with
> this outcome, or that Word is better than <insert your favourite here>, or
> that it isn't the result of Microsoft exploiting its market share.
>
> What I am saying is that a Word document is probably the last format we need
> to worry about for preservation purposes for the foreseeable future. Except
> maybe PDF.
>
> D
>
> On 26/07/2012 2:33 AM, Chris Eaker wrote:
>> Thanks for pointing this out, Leslie. I did not know this about Docx
>> files. I can see how this would be a better format for preservation of
>> not only content, but also formatting.
>>
>> On Wed, Jul 25, 2012 at 7:57 AM, Leslie Carr <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>> If like to point out that Word files (docx) are an XML-based open
>> standard format, and that our prejudice against them is probably
>> rooted in historic antipathy towards previous proprietary formats
>> rather than any genuine problem with the the format itself.
>>
>> PDF, on the other hand, is also an open standard, but it makes reuse
>> very difficult. 10 years ago we thought that was a good thing. Now
>> we believe the opposite.
>>
>> Sent from my iPhone
>>
>> On 25 Jul 2012, at 14:43, "Chris Eaker" <[log in to unmask]
>> <mailto:[log in to unmask]><mailto:[log in to unmask]
>> <mailto:[log in to unmask]>>> wrote:
>>
>> Sorry if I'm asking novice questions (but that's what I am), are you
>> most interested in saving the content or the formatting or both? If
>> the content is the most important thing to preserve, then why not
>> just save the file as PDF and archive that as the master so you have
>> a copy with all formatting intact, but then save a txt for an
>> editable version that maintains content (assuming you need to edit
>> in the future)? I'm wary of archiving *.DOC/X files because they may
>> not be readable for the long-term.
>>
>> On Wed, Jul 25, 2012 at 4:49 AM, Brian Kelly <[log in to unmask]
>> <mailto:[log in to unmask]><mailto:[log in to unmask]
>> <mailto:[log in to unmask]>>> wrote:
>> I've always deposited an MS Word copy of my papers in my local
>> repository, together with a PDF copy. I've done this because I've
>> been told of the importance of preserving the master copy of a
>> resource, rather than a lossy derivative version, such as PDF. As
>> I've experience in having to recreate an MS Word file from a PDF
>> copy I know this can be a cumbersome process. I assume some authors
>> may prefer to deposit a PDF copy as this may be regarded as
>> providing a form of DRM by making it slightly more difficult to
>> process the file.
>>
>> What policies and practices do people have in place related to this?
>> A Google search for "Policies on depositing MS Word files" suggests
>> that PDFs are the norm. Since the MS Office format has been an ISO
>> standard since 2007 I assume the proprietary versus open standard
>> format for deposits argument is not as strong as it was (subject to
>> caveats about support for ISO/IEC 29500 Strict
>> and the arguments about the validity of the standardisation process
>> which I don't want to go into).
>>
>> Thanks
>>
>> Brian
>>
>>
>> --
>> --------------------------------------------------------
>> Brian Kelly
>> Innovation Support Centre, UKOLN, University of Bath, Bath, UK, BA2 7AY
>> Phone: 01225 383943
>> Email: [log in to unmask]
>> <mailto:[log in to unmask]><mailto:[log in to unmask]
>> <mailto:[log in to unmask]>>
>> Blog: http://ukwebfocus.wordpress.com/
>> Twitter: http://twitter.com/briankelly
>> Web: http://isc.ukoln.ac.uk/
>>
>>
>>
>> --
>> Christopher Eaker, P.E.
>> Graduate Research Assistant
>> Data Curation Education in Research Centers
>> University of Tennessee, Knoxville
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Christopher Eaker, P.E.
>> Graduate Research Assistant
>> Data Curation Education in Research Centers
>> University of Tennessee, Knoxville
>>
>
> --
> David Groenewegen
> Director, Research Data
> Australian National Data Service
> Physical Address: 680 Blackburn Road, Clayton, Victoria
> Postal Address: c/o Monash University VIC 3800
> AUSTRALIA
>
> Ph: +61 3 9902 0570
> Fx: +61 3 9902 0599
> Mb: +61 (0) 409 969 658
> [log in to unmask]
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
|