Print

Print


To members of the PREMIS Initiative.

I have given a great deal of attention to the draft documents listed below over the past few weeks, and greatly appreciate this opportunity to provide feedback and suggestions.

Thanks so much for the considerable effort that went into creating these documents.  A very important need is a large step closer to being filled due to the efforts of the PREMIS Initiative.  The model and set of semantic units you have developed should prove to be an excellent starting point for institutions that have been wanting to implement the OAIS Reference Model, but didn't know where to start. 

Firstly, I'm sure there will be continued discussion regarding the data-modeling issues brought up on the DC-Architecture and Libraries lists.  These issues shouldn't pose too much of a problem for the PREMIS initiative unless the Preservation WG begins work on an Application Profile combine PREMIS semantic units with DC elements.  I don't see too much utility in doing so, given the vastly different scopes and applications of the two.

Beyond that, I have a number of comments on both the Final Report and the Data Dictionary, discussed below:

Final Report -

* There are a few terms used frequently in both documents that should probably be included in the glossary:
   1- Inhibitor - it was not immediately clear to me what was meant by this semantic unit
   2- Semantic Unit - although defined in several places throughout the report, it would useful to have it here for easy referencing while reading the rest of the report.
   3- Actionable - only used in context of an event outcome, so could either be defined here or under that semantic unit (Data Dictionary page 46)

* In the section of the data model discussion Relationships (p. 14-16), it would be very helpful if some attention was given to the distinction between relatedEvents and linkingEvents. It was not immediately clear to me that a relatedEvent only exists where there is a relationship between objects that is somehow connected to the event.  Perhaps a subsection under Relationships that talks about Linking Events and essentially says: "an event can be associated with an object in one of two ways.  The event can be tied to an objects relationship to another object, or the event can be connected to the object, but not in such a way that a second object is generated or involved in the event.  These are modeled differently in the hierarchical structure of Object Entity descriptions."

* The use of semantic unit for a particular entity, versus semantic unit further down the hierarchy, is a bit confusing.  On page 17, at the end of the 1:1 principle discussion, there is a reference to the "relationship semantic unit associated with the Object entity."  This initially led me to think in terms of a Relationship entity, although I recognize that this is not what's intended.  Perhaps it would be useful to be more clear in the definition of semantic units, either with a glossary entry or a note in the first paragraph Section 7 - Intro to Data Dictionary.  This could easily be accomplished by revising the second sentence to read "The Data Dictionary includes a group of hierarchical semantic units for the following types of entities..."


Element Set -

* fixity - Is the date of creation (or perhaps last successful validation date) another useful component of the fixity Semantic Unit?  When dealing with MD5s in a file based environment, I can determine the date it was created from the file system timestamp.  If the hash is contained in a database or other implementation system, there's no way to know when the hash was generated.  When a fixity check comes up invalid, knowing the date that the check was last valid is necessary for restoring a valid object from backup.  If there is a linkingEntity for a previous successful fixity check, this date can be easily ascertained, but otherwise it's unknown.  I'm undecided as to whether recommend practice for a date component would be to update the date each time a fixity check is run and linked.

* creatingApplication: I'm a bit confused by the Usage notes under creatingApplication (p 18).   When discussing the repeatability, it says that "a file could have been created by MSWord and later turned into a PDF using Adobe Photoshop. Details of both ... could be recorded."  However, the section of the Final Report on the 1:1 principle clearly states that the object in question should be described as two objects.  Perhaps the usage note should clarify that the description of the PDF may trace the history of processing applications and processes that led to the creation of the file, even if the process was through intermediary files, and that if the intermediary file exists in the preservation archive, it should be described as a related object.  Implementing this becomes challenging when looking at the sub-units, particularly the dateCreatedByApplication.  Do you record the date that the original Word document in the above example was created?  This seems to be a description of the Word document, not the PDF.  But, if there's a CreatingApplication entry for each application, how would the presence of a date for Adobe's entry and not for Word's be interpreted?

* Dependency:  The usage notes indicate that this is for additional objects that are _necessary_ to render.  What about files / objects that are recommend or useful?  For example, a georeferenced tiff has a World File that provides information about the projection and coordinates represented.  This is necessary for a GIS application to make effective use of the file, but is not necessary for displaying the file itself.  Would this be dealt with by having two separate environment units, with environmentCharacteristics of 'minimum' versus 'recommended'?

*swOtherInformation:  I have concerns about using a URI to point to the software documentation.  Aren't there issues with persistence of software vendor's pages as well as whether the link will be pointing to the version referenced in the environment or to the latest availble version?  Perhaps it would be best to have a recommendation saying that links to external documentation should include a note indicating that special attention should be paid to version numbers. 

*swType:  Notes mentions fonts or stylesheets under ancillary software.  Why would these not be recorded as dependencies of the object rather than as another layer of software?

*signatureInformation: Similar to my comment regarding fixity, unless it's always recommend to record additional signature information as a linkingEvent, it would probably be good to have a sub-element for dateSigned.

* A general question - Why do some elements recommend the use of controlled vocabularies and / or term lists, while others recommend using numerical codes that are used to represent a list of possible values (eg. EventOutcome)?

Thanks again for the opportunity to provide feedback in this way.  I recognize that there are a lot of question marks in my comments above.  Please don't consider these questions that I expect you to answer on list.  They're just questions that I think you may wish to take into consideration when revising this document prior to public release.

Hopefully the feedback I have provided proves useful to your continued efforts in this area.

Sincerley,

Corey A Harper
Metadata Librarian - CMET Team Leader
Metadata and Digital Library Services
University of Oregon
541/346.1854
[log in to unmask] edu




At 02:30 PM 2/1/2005, you wrote:
To members of DC-Preservation Working Group:

We welcome any comments on these documents. Send them to this list
(DC-Preservation) or to me.

Rebecca

---------- Forwarded message ----------
Date: Tue, 1 Feb 2005 17:27:37 -0500 (EST)
From: Rebecca S. Guenther <[log in to unmask]>
To: [log in to unmask]
Subject: PREMIS final report and data dictionary

We are pleased to announce the initial completion of the PREMIS final
report and data dictionary. These documents are for review by the PREMIS
Working Group, Advisory Commitee, and other invited experts before a more
general world-wide review. The documents still require final editing and
formatting. There are 3 documents:

1. PREMIS Final Report
This narrative gives a summary of the activities of the PREMIS working
group and is an introduction to the PREMIS data dictionary. It explains
many of the decisions that the Core elements subgroup made when developing
the data dictionary, details the data model used, and defines
terminology. It should be read prior to reviewing the data dictionary.

2. PREMIS Data Dictionary
This document is a complete specification of data elements (called
"semantic units") that make up this core preservation metadata element
set. It includes element names, definitions, applicability, repeatability,
obligation, examples, creation/maintenance notes, and usage notes. The
data dictionary is organized by type of entity, as explained in the data
model.

3. Examples
There will be selected full examples for various types of digital objects.
This section is not yet completed, but will be shortly. We will announce
the availability of the examples in a separate message.

To access the documents go to:
http://premis.lib.uchicago.edu:8888/premis
This site is password protected.
Login: premis
Password: promise
Then click on Review Draft of PREMIS Final Report

How to send comments
Comments may be sent to [log in to unmask]  If you prefer, you may
send personally to Rebecca Guenther ([log in to unmask]) and Priscilla Caplan
([log in to unmask]).

Deadline
Please send all comments within 3 weeks, by Feb. 21.

Thank you all for your participation.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^  Rebecca S. Guenther                                   ^^
^^  Senior Networking and Standards Specialist            ^^
^^  Network Development and MARC Standards Office         ^^
^^  1st and Independence Ave. SE                          ^^
^^  Library of Congress                                   ^^
^^  Washington, DC 20540-4402                             ^^
^^  (202) 707-5092 (voice)    (202) 707-0115 (FAX)        ^^
^^  [log in to unmask]                                          ^^
^^                                                        ^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^