Hi John,
> At the risk of veering off topic a bit,
Veer away ;-) I changed the subject line - I think this process of
"augmenting" metadata is sometimes referred to - in DC circles at least
- as "smartening up" or "smarting up", to contrast with the mapping from
a richer form to Simple DC which is sometimes described as "dumbing
down".
> it can make a difference to a service provider how the record was
> generated - software generated records will have
> 'predictable' errors or
> gaps and can be more easily (automatically) 'transformed' if
> the offered
> records aren't up to the service provider requirements. For example
> (borrowed from NSDL) if records are using a standard
> vocabulary for an
> element but haven't supplied version details couldn't a
> service provider
> fill in the missing details through checking your schema registry?
While I reserve the right to change my mind (!), and speaking only for
myself, I tend to be cautious about the extent that the IEMSR will
support this process.
Taking a DC-based example, as it's a bit simpler and it's what I'm more
familiar with...
If a data provider declares that they are using the "Simple DC" DC
Application Profile, then they are explicitly limiting the information
conveyed in their metadata to the very simple statements that can be
constructed using the 15 DC elements. If an occurrence of the dc:type
property in a Simple DC description has a literal value of "Text", then
that is the value of that property: there is no way of telling - either
from the metadata record or from the "Simple DC" AP - whether that
corresponds to a value in the DCMI Type Vocabulary or a value in My
Completely Different Type Vocabulary, or whether it is just an arbitrary
string. That's just the nature of Simple DC - it provides limited
"expressivity" but (hopefully) is widely "understood".
A data provider might declare that they are using the "Simple DC for
e-Prints" DC Application Profile [1]. That specification makes
recommendations for how the properties available within Simple DC are
used - particularly in terms of guidance on what values are provided. So
for example for dc:date
> The 'last-modified' date of the eprint and/or the date of its
accession into the archive.
>
> If necessary, repeat this element to provide both the last-modified
date and the date of accession.
> The last-modified date will be assumed to be the more recent of the
two dates. If only one date is provided,
> it will be assumed that the last-modified date and the date of
accession are the same.
The current proposal for representing this information for the IEMSR
will support exactly this - the provision of a human-readable commentary
on how the values for dc:date are being created in the _specific_
context of that DCAP. The current model does _not_ support the capacity
to express in machine-readable form that (in this context only) the use
of dc:date is equivalent to the use of dcterms:modified (and/or
dcterms:dateSubmitted).
(I guess you could argue that since it is provided in a human-readable
form, and a human service provider can read it and program their
application to act on it, there might as well be a machine-processable
statement, but (to date, at least, and AFAIK!) that hasn't been
considered a requirement for a DC Application Profile.)
It seems to me that if a data provider wishes to express reliably to a
service provider that a relation between a resource and a date is that
of dcterms:modified or dcterms:dateSubmitted, rather than just dc:date
("err, it's a date"), or that a value is from a specified vocabulary,
then they must represent that information explicitly in their metadata
descriptions.
But one of the difficulties with these discussions is that it seems to
me we have notions of what metadata application profiles are, and we've
refined those notions over the last few years (which is a Good Thing),
but - as Andy said in his last message - we (still) aren't quite clear
what we want to use a metadata application profile _for_ - what problems
we expect to solve, what real functions we expect to provide using this
information. (And as a consequence, the IEMSR project has been (I hope!)
relatively conservative in the context of the IEMSR work so far,
emphasising disclosure, enabling reuse of existing solutions, etc.)
Cheers
Pete
[1] http://www.rdn.ac.uk/projects/eprints-uk/docs/simpledc-guidelines/
|