Print

Print


To members of the PREMIS Initiative.

I have given a great deal of attention to the draft documents listed below
over the past few weeks, and greatly appreciate this opportunity to provide
feedback and suggestions.

Thanks so much for the considerable effort that went into creating these
documents.  A very important need is a large step closer to being filled
due to the efforts of the PREMIS Initiative.  The model and set of semantic
units you have developed should prove to be an excellent starting point for
institutions that have been wanting to implement the OAIS Reference Model,
but didn't know where to start.

Firstly, I'm sure there will be continued discussion regarding the
data-modeling issues brought up on the DC-Architecture and Libraries
lists.  These issues shouldn't pose too much of a problem for the PREMIS
initiative unless the Preservation WG begins work on an Application Profile
combine PREMIS semantic units with DC elements.  I don't see too much
utility in doing so, given the vastly different scopes and applications of
the two.

Beyond that, I have a number of comments on both the Final Report and the
Data Dictionary, discussed below:

Final Report -

* There are a few terms used frequently in both documents that should
probably be included in the glossary:
    1- Inhibitor - it was not immediately clear to me what was meant by
this semantic unit
    2- Semantic Unit - although defined in several places throughout the
report, it would useful to have it here for easy referencing while reading
the rest of the report.
    3- Actionable - only used in context of an event outcome, so could
either be defined here or under that semantic unit (Data Dictionary page 46)

* In the section of the data model discussion Relationships (p. 14-16), it
would be very helpful if some attention was given to the distinction
between relatedEvents and linkingEvents. It was not immediately clear to me
that a relatedEvent only exists where there is a relationship between
objects that is somehow connected to the event.  Perhaps a subsection under
Relationships that talks about Linking Events and essentially says: "an
event can be associated with an object in one of two ways.  The event can
be tied to an objects relationship to another object, or the event can be
connected to the object, but not in such a way that a second object is
generated or involved in the event.  These are modeled differently in the
hierarchical structure of Object Entity descriptions."

* The use of semantic unit for a particular entity, versus semantic unit
further down the hierarchy, is a bit confusing.  On page 17, at the end of
the 1:1 principle discussion, there is a reference to the "relationship
semantic unit associated with the Object entity."  This initially led me to
think in terms of a Relationship entity, although I recognize that this is
not what's intended.  Perhaps it would be useful to be more clear in the
definition of semantic units, either with a glossary entry or a note in the
first paragraph Section 7 - Intro to Data Dictionary.  This could easily be
accomplished by revising the second sentence to read "The Data Dictionary
includes a group of hierarchical semantic units for the following types of
entities..."


Element Set -

* fixity - Is the date of creation (or perhaps last successful validation
date) another useful component of the fixity Semantic Unit?  When dealing
with MD5s in a file based environment, I can determine the date it was
created from the file system timestamp.  If the hash is contained in a
database or other implementation system, there's no way to know when the
hash was generated.  When a fixity check comes up invalid, knowing the date
that the check was last valid is necessary for restoring a valid object
from backup.  If there is a linkingEntity for a previous successful fixity
check, this date can be easily ascertained, but otherwise it's
unknown.  I'm undecided as to whether recommend practice for a date
component would be to update the date each time a fixity check is run and
linked.

* creatingApplication: I'm a bit confused by the Usage notes under
creatingApplication (p 18).   When discussing the repeatability, it says
that "a file could have been created by MSWord and later turned into a PDF
using Adobe Photoshop. Details of both ... could be recorded."  However,
the section of the Final Report on the 1:1 principle clearly states that
the object in question should be described as two objects.  Perhaps the
usage note should clarify that the description of the PDF may trace the
history of processing applications and processes that led to the creation
of the file, even if the process was through intermediary files, and that
if the intermediary file exists in the preservation archive, it should be
described as a related object.  Implementing this becomes challenging when
looking at the sub-units, particularly the dateCreatedByApplication.  Do
you record the date that the original Word document in the above example
was created?  This seems to be a description of the Word document, not the
PDF.  But, if there's a CreatingApplication entry for each application, how
would the presence of a date for Adobe's entry and not for Word's be
interpreted?

* Dependency:  The usage notes indicate that this is for additional objects
that are _necessary_ to render.  What about files / objects that are
recommend or useful?  For example, a georeferenced tiff has a World File
that provides information about the projection and coordinates
represented.  This is necessary for a GIS application to make effective use
of the file, but is not necessary for displaying the file itself.  Would
this be dealt with by having two separate environment units, with
environmentCharacteristics of 'minimum' versus 'recommended'?

*swOtherInformation:  I have concerns about using a URI to point to the
software documentation.  Aren't there issues with persistence of software
vendor's pages as well as whether the link will be pointing to the version
referenced in the environment or to the latest availble version?  Perhaps
it would be best to have a recommendation saying that links to external
documentation should include a note indicating that special attention
should be paid to version numbers.

*swType:  Notes mentions fonts or stylesheets under ancillary
software.  Why would these not be recorded as dependencies of the object
rather than as another layer of software?

*signatureInformation: Similar to my comment regarding fixity, unless it's
always recommend to record additional signature information as a
linkingEvent, it would probably be good to have a sub-element for dateSigned.

* A general question - Why do some elements recommend the use of controlled
vocabularies and / or term lists, while others recommend using numerical
codes that are used to represent a list of possible values (eg. EventOutcome)?

Thanks again for the opportunity to provide feedback in this way.  I
recognize that there are a lot of question marks in my comments
above.  Please don't consider these questions that I expect you to answer
on list.  They're just questions that I think you may wish to take into
consideration when revising this document prior to public release.

Hopefully the feedback I have provided proves useful to your continued
efforts in this area.

Sincerley,

Corey A Harper
Metadata Librarian - CMET Team Leader
Metadata and Digital Library Services
University of Oregon
541/346.1854
<mailto:[log in to unmask]>[log in to unmask]<mailto:[log in to unmask]>edu





At 02:30 PM 2/1/2005, you wrote:
>To members of DC-Preservation Working Group:
>
>We welcome any comments on these documents. Send them to this list
>(DC-Preservation) or to me.
>
>Rebecca
>
>---------- Forwarded message ----------
>Date: Tue, 1 Feb 2005 17:27:37 -0500 (EST)
>From: Rebecca S. Guenther <[log in to unmask]>
>To: [log in to unmask]
>Subject: PREMIS final report and data dictionary
>
>We are pleased to announce the initial completion of the PREMIS final
>report and data dictionary. These documents are for review by the PREMIS
>Working Group, Advisory Commitee, and other invited experts before a more
>general world-wide review. The documents still require final editing and
>formatting. There are 3 documents:
>
>1. PREMIS Final Report
>This narrative gives a summary of the activities of the PREMIS working
>group and is an introduction to the PREMIS data dictionary. It explains
>many of the decisions that the Core elements subgroup made when developing
>the data dictionary, details the data model used, and defines
>terminology. It should be read prior to reviewing the data dictionary.
>
>2. PREMIS Data Dictionary
>This document is a complete specification of data elements (called
>"semantic units") that make up this core preservation metadata element
>set. It includes element names, definitions, applicability, repeatability,
>obligation, examples, creation/maintenance notes, and usage notes. The
>data dictionary is organized by type of entity, as explained in the data
>model.
>
>3. Examples
>There will be selected full examples for various types of digital objects.
>This section is not yet completed, but will be shortly. We will announce
>the availability of the examples in a separate message.
>
>To access the documents go to:
>http://premis.lib.uchicago.edu:8888/premis
>This site is password protected.
>Login: premis
>Password: promise
>Then click on Review Draft of PREMIS Final Report
>
>How to send comments
>Comments may be sent to [log in to unmask]  If you prefer, you may
>send personally to Rebecca Guenther ([log in to unmask]) and Priscilla Caplan
>([log in to unmask]).
>
>Deadline
>Please send all comments within 3 weeks, by Feb. 21.
>
>Thank you all for your participation.
>
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>^^  Rebecca S. Guenther                                   ^^
>^^  Senior Networking and Standards Specialist            ^^
>^^  Network Development and MARC Standards Office         ^^
>^^  1st and Independence Ave. SE                          ^^
>^^  Library of Congress                                   ^^
>^^  Washington, DC 20540-4402                             ^^
>^^  (202) 707-5092 (voice)    (202) 707-0115 (FAX)        ^^
>^^  [log in to unmask]                                          ^^
>^^                                                        ^^
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^