Even better, the zip file contains a media directory with all the images embedded in the document. It's a godsend for third party copyright checking.
Sent from my iPhone
On 25 Jul 2012, at 15:16, "Talat Chaudhri" <[log in to unmask]<mailto:[log in to unmask]>> wrote:
I was once given this tip in terms of reading DOCX files and seeing whether they are usable in the long term, which is quite an interesting practical test that anyone can do: re-name it as a zip file and then unzip it. Then check through the contents, which contains quite a lot of metadata, machine-readable packaging, formatting information and the actual content. On that basis, it seems fair to say that it will be fairly easily readable and processable in the future, even if current software platforms become unusable or unavailable. I second Les' remark that it's an entirely different thing to earlier DOC formats, which were proprietary and technically difficult to re-use. It's probably fair to say that DOCX is not all that bad from a preservation perspective.
I might add that ePub seems to take a very similar approach. The metadata it contains can in principle be very extensive although in practice it's far more restricted. By comparison, there is often a lot that can be extracted from DOCX files (really archives), though not in a standard metadata format. But there is much more than just traditional metadata, so I wouldn't like to restrict the debate just to that.
I'd be interested to know if people agree or disagree with this position on technical grounds. It really is worth taking a look for yourself.
Talat
On 25/07/2012 14:32, Chris Eaker wrote:
Sorry if I'm asking novice questions (but that's what I am), are you most interested in saving the content or the formatting or both? If the content is the most important thing to preserve, then why not just save the file as PDF and archive that as the master so you have a copy with all formatting intact, but then save a txt for an editable version that maintains content (assuming you need to edit in the future)? I'm wary of archiving *.DOC/X files because they may not be readable for the long-term.
On Wed, Jul 25, 2012 at 4:49 AM, Brian Kelly <[log in to unmask]<mailto:[log in to unmask]>> wrote:
I've always deposited an MS Word copy of my papers in my local repository, together with a PDF copy. I've done this because I've been told of the importance of preserving the master copy of a resource, rather than a lossy derivative version, such as PDF. As I've experience in having to recreate an MS Word file from a PDF copy I know this can be a cumbersome process. I assume some authors may prefer to deposit a PDF copy as this may be regarded as providing a form of DRM by making it slightly more difficult to process the file.
What policies and practices do people have in place related to this? A Google search for "Policies on depositing MS Word files" suggests that PDFs are the norm. Since the MS Office format has been an ISO standard since 2007 I assume the proprietary versus open standard format for deposits argument is not as strong as it was (subject to caveats about support for ISO/IEC 29500 Strict
and the arguments about the validity of the standardisation process which I don't want to go into).
Thanks
Brian
--
--------------------------------------------------------
Brian Kelly
Innovation Support Centre, UKOLN, University of Bath, Bath, UK, BA2 7AY
Phone: 01225 383943
Email: [log in to unmask]<mailto:[log in to unmask]>
Blog: http://ukwebfocus.wordpress.com/
Twitter: http://twitter.com/briankelly
Web: http://isc.ukoln.ac.uk/
--
Christopher Eaker, P.E.
Graduate Research Assistant
Data Curation Education in Research Centers
University of Tennessee, Knoxville
--
Dr Talat Chaudhri
------------------------------------------------------------
Research Officer
Innovation Support Centre
UKOLN
University of Bath
Telephone: +44 (0)1970 626206 Fax: +44 (0)1225 386838
E-mail: [log in to unmask]<mailto:[log in to unmask]> Skype: talat.chaudhri
Web: http://www.ukoln.ac.uk/ukoln/staff/t.chaudhri/
------------------------------------------------------------
|