> On Monday, December 01, 1997 4:09 PM, Dan Brickley
> > Is there perhaps a case for creating some simple 'utility' DC schemes
> > here, so we could be explicit about the separator character?
> I don't think we need a scheme here, but rather a convention ...
Yep, if we can agree on a standard separator character for DC, that'd be
great. If not, we need a machine-readable way of tagging our multi-value
content with information about how to normalise it. Otherwise we'll get
into problems with (a) building browsable views based on keywords, (b)
generating _rich_ RDF models.
It doesn't sound as if we're going to have a problem. From what we've
heard in this thread, there are some special separator characters
which crop up in one or more of the standardised subject/keyword
vocabularies. LCSH uses this, MeSH uses that, XYZ uses
something else...
What this means is that we're unlikely to find a standard separator
character which we can honestly say will separate _any_ SCHEME adopted
for high-quality classification purposes, since we can't tell in advance
what those schemes will look like.
So... it may be more realistic to say that the magic
DC-splitter-character, whatever we decide upon, _only_ acts as a splitter
for elements that are not associated with a SCHEME. In other words, as
soon as a standard SCHEME is named, the 'splitter' character loses all
special significance. This would mean that anyone attempting a formal
definition of their (encoding of a) favourite SCHEME would have to address
the issue of representing multiple-values.
The only alternatives to this approach I can think of are...
1. do a vast literature search on all possible formal schemes to find
an unproblematic character for use in splitting. '|' might be a good
bet.
2. establish a convention that all SCHEMEd text represented in DC must
respect the special role of our chosen splitter character. This could
force people to invent new physical encodings of schemes that clashed.
Both of which sound too much like hard work ;-)
> >(This takes us back to the 'what exactly are schemes and how are they
> > formally named' question)
> >
> >
> > Could someone clarify whether keywords are a special case here. Or
> > are there other circumstances in which a single DC entry can, as a
> > convenience, contain multiple values?
>
> My sense is that elements whose values are atomic (keywords, are the
> best example)are
> easily grouped as delimited series in a single meta tag. Elements with
> parseable substructure might be better accommodated in seperate tags
> (Creators with email addresses, affiliations, etc.)
Sounds reasonable. In practice it's the elements with shorter values that
are the most appealing candidates for repeating within one string; this
amounts to much the same thing though, I think.
We still need an 'official' line on which element repetitions can be
abbreviated in this way. This is important if we're going to be able to
automatically munge DC data into nice clean RDF models; we wouldn't want
to accidentally split a Description just because it contained a semicolon,
nor build a browsable listing which contained mult-value Creator fields.
We could, for example, say that (for sake of argument) Creator, Subject,
Date, Type, Format, Language were the 'splittable' elements. We would then
need to decide on the splitting character, and agree that our choice of
character was compatible with the choice of splittable fields.
eg. if we choose comma, Creator should probably be dropped from this
list. And if we choose semicolon, it'll probably be OK to have Creator as
a splittable field. It doesn't really matter as long as we agree - this
multiple values business is only a syntactic convenience anyway.
> Keep in mind that this is only a problem in the META kludge world. When
> RDF is deployed, it will afford a formal means for structuring data. So, while the
> problem of structural design of our data is, even as the poor, always
> with us, the difficulty of building that data into parseable structures
> is going to get easier.
Yep, thankfully. But the splitting thing could turn out be a problem in
the RDF world too, if we fail to produce machine-processable metadata
while we're waiting for RDF to solidify. RDF gives us a great model for
unambiguously representing complex statements in DublinCorese. What it
won't give is any mechanism for automatically rescuing ambiguous metadata.
Dan
--
[log in to unmask]
Research and Development Unit tel: +44(0)117 9288478
Institute for Learning and Research Technology http://www.ilrt.bris.ac.uk/
University of Bristol, Bristol BS8 1TN, UK. fax: +44(0)117 9288473
|