To members of the PREMIS Initiative.
I have given a great deal of attention to the draft documents listed
below over the past few weeks, and greatly appreciate this opportunity to
provide feedback and suggestions.
Thanks so much for the considerable effort that went into creating these
documents. A very important need is a large step closer to being
filled due to the efforts of the PREMIS Initiative. The model and
set of semantic units you have developed should prove to be an excellent
starting point for institutions that have been wanting to implement the
OAIS Reference Model, but didn't know where to start.
Firstly, I'm sure there will be continued discussion regarding the
data-modeling issues brought up on the DC-Architecture and Libraries
lists. These issues shouldn't pose too much of a problem for the
PREMIS initiative unless the Preservation WG begins work on an
Application Profile combine PREMIS semantic units with DC elements.
I don't see too much utility in doing so, given the vastly different
scopes and applications of the two.
Beyond that, I have a number of comments on both the Final Report and the
Data Dictionary, discussed below:
Final Report -
* There are a few terms used frequently in both documents that should
probably be included in the glossary:
1- Inhibitor - it was not immediately clear to me what was
meant by this semantic unit
2- Semantic Unit - although defined in several places
throughout the report, it would useful to have it here for easy
referencing while reading the rest of the report.
3- Actionable - only used in context of an event outcome, so
could either be defined here or under that semantic unit (Data Dictionary
page 46)
* In the section of the data model discussion Relationships (p. 14-16),
it would be very helpful if some attention was given to the distinction
between relatedEvents and linkingEvents. It was not immediately clear to
me that a relatedEvent only exists where there is a relationship between
objects that is somehow connected to the event. Perhaps a
subsection under Relationships that talks about Linking Events and
essentially says: "an event can be associated with an object in one
of two ways. The event can be tied to an objects relationship to
another object, or the event can be connected to the object, but not in
such a way that a second object is generated or involved in the
event. These are modeled differently in the hierarchical structure
of Object Entity descriptions."
* The use of semantic unit for a particular entity, versus semantic unit
further down the hierarchy, is a bit confusing. On page 17, at the
end of the 1:1 principle discussion, there is a reference to the
"relationship semantic unit associated with the Object
entity." This initially led me to think in terms of a
Relationship entity, although I recognize that this is not what's
intended. Perhaps it would be useful to be more clear in the
definition of semantic units, either with a glossary entry or a note in
the first paragraph Section 7 - Intro to Data Dictionary. This
could easily be accomplished by revising the second sentence to read
"The Data Dictionary includes a group of hierarchical semantic units
for the following types of entities..."
Element Set -
* fixity - Is the date of creation (or perhaps last successful validation
date) another useful component of the fixity Semantic Unit? When
dealing with MD5s in a file based environment, I can determine the date
it was created from the file system timestamp. If the hash is
contained in a database or other implementation system, there's no way to
know when the hash was generated. When a fixity check comes up
invalid, knowing the date that the check was last valid is necessary for
restoring a valid object from backup. If there is a linkingEntity
for a previous successful fixity check, this date can be easily
ascertained, but otherwise it's unknown. I'm undecided as to
whether recommend practice for a date component would be to update the
date each time a fixity check is run and linked.
* creatingApplication: I'm a bit confused by the Usage notes under
creatingApplication (p 18). When discussing the
repeatability, it says that "a file could have been created by
MSWord and later turned into a PDF using Adobe Photoshop. Details of both
... could be recorded." However, the section of the Final
Report on the 1:1 principle clearly states that the object in question
should be described as two objects. Perhaps the usage note should
clarify that the description of the PDF may trace the history of
processing applications and processes that led to the creation of the
file, even if the process was through intermediary files, and that if the
intermediary file exists in the preservation archive, it should be
described as a related object. Implementing this becomes
challenging when looking at the sub-units, particularly the
dateCreatedByApplication. Do you record the date that the original
Word document in the above example was created? This seems to be a
description of the Word document, not the PDF. But, if there's a
CreatingApplication entry for each application, how would the presence of
a date for Adobe's entry and not for Word's be interpreted?
* Dependency: The usage notes indicate that this is for additional
objects that are _necessary_ to render. What about files / objects
that are recommend or useful? For example, a georeferenced tiff has
a World File that provides information about the projection and
coordinates represented. This is necessary for a GIS application to
make effective use of the file, but is not necessary for displaying the
file itself. Would this be dealt with by having two separate
environment units, with environmentCharacteristics of 'minimum' versus
'recommended'?
*swOtherInformation: I have concerns about using a URI to point to
the software documentation. Aren't there issues with persistence of
software vendor's pages as well as whether the link will be pointing to
the version referenced in the environment or to the latest availble
version? Perhaps it would be best to have a recommendation saying
that links to external documentation should include a note indicating
that special attention should be paid to version numbers.
*swType: Notes mentions fonts or stylesheets under ancillary
software. Why would these not be recorded as dependencies of the
object rather than as another layer of software?
*signatureInformation: Similar to my comment regarding fixity, unless
it's always recommend to record additional signature information as a
linkingEvent, it would probably be good to have a sub-element for
dateSigned.
* A general question - Why do some elements recommend the use of
controlled vocabularies and / or term lists, while others recommend using
numerical codes that are used to represent a list of possible values (eg.
EventOutcome)?
Thanks again for the opportunity to provide feedback in this way. I
recognize that there are a lot of question marks in my comments
above. Please don't consider these questions that I expect you to
answer on list. They're just questions that I think you may wish to
take into consideration when revising this document prior to public
release.
Hopefully the feedback I have provided proves useful to your continued
efforts in this area.
Sincerley,
Corey A Harper
Metadata Librarian - CMET Team Leader
Metadata and Digital Library Services
University of Oregon
541/346.1854
[log in to unmask]
edu
At 02:30 PM 2/1/2005, you wrote:
To members of DC-Preservation
Working Group:
We welcome any comments on these documents. Send them to this list
(DC-Preservation) or to me.
Rebecca
---------- Forwarded message ----------
Date: Tue, 1 Feb 2005 17:27:37 -0500 (EST)
From: Rebecca S. Guenther <[log in to unmask]>
To: [log in to unmask]
Subject: PREMIS final report and data dictionary
We are pleased to announce the initial completion of the PREMIS
final
report and data dictionary. These documents are for review by the
PREMIS
Working Group, Advisory Commitee, and other invited experts before a
more
general world-wide review. The documents still require final editing
and
formatting. There are 3 documents:
1. PREMIS Final Report
This narrative gives a summary of the activities of the PREMIS
working
group and is an introduction to the PREMIS data dictionary. It
explains
many of the decisions that the Core elements subgroup made when
developing
the data dictionary, details the data model used, and defines
terminology. It should be read prior to reviewing the data
dictionary.
2. PREMIS Data Dictionary
This document is a complete specification of data elements (called
"semantic units") that make up this core preservation metadata
element
set. It includes element names, definitions, applicability,
repeatability,
obligation, examples, creation/maintenance notes, and usage notes.
The
data dictionary is organized by type of entity, as explained in the
data
model.
3. Examples
There will be selected full examples for various types of digital
objects.
This section is not yet completed, but will be shortly. We will
announce
the availability of the examples in a separate message.
To access the documents go to:
http://premis.lib.uchicago.edu:8888/premis
This site is password protected.
Login: premis
Password: promise
Then click on Review Draft of PREMIS Final Report
How to send comments
Comments may be sent to [log in to unmask] If you prefer,
you may
send personally to Rebecca Guenther ([log in to unmask]) and Priscilla
Caplan
([log in to unmask]).
Deadline
Please send all comments within 3 weeks, by Feb. 21.
Thank you all for your participation.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^ Rebecca S.
Guenther
^^
^^ Senior Networking and Standards
Specialist
^^
^^ Network Development and MARC Standards
Office ^^
^^ 1st and Independence Ave.
SE
^^
^^ Library of
Congress
^^
^^ Washington, DC
20540-4402
^^
^^ (202) 707-5092 (voice) (202) 707-0115
(FAX) ^^
^^
[log in to unmask]
^^
^^
^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^