On Wed, 25 Feb 2009, Thomas Baker wrote:
> On Tue, Feb 24, 2009 at 03:16:30PM -0800, Stu Weibel wrote:
>>> One goal is to make it so that when
>>> someone fills out the fields of a standard, e.g., dc:keywords, they do
>>> so using appropriate controlled vocabularies, not just whatever words
>>> come to mind. Other ontology services (e..g, Watson [3]) collect a
>>> large range of ontologies, many of which include these concepts, but
>>> like Google there's lots of noise.
>>
>> I'd almost go so far as to say that without controlled vocabularies, the
>> value of the metadata is severely compromised and perhaps not worth the
>> effort to build it.
>
> I agree with Stu and would take this a step further: without
> URIs to identify their terms, the value of the controlled
> vocabularies themselves is compromised.
>
> URIs provide a way to cite the terms of controlled vocabularies
> without ambiguity across a variety of application contexts.
>
> Ed Summers spoke about this at DC-2008 [1]. On Slide 40 he
> shows a statement about a resource whose subject is "World
> Wide Web":
>
> dc:subject "World Wide Web" (a character string)
>
> and on the next slide he shows the same statement only using
> a URI to identify the concept "World Wide Web" (in this case,
> as defined in the Library of Congress Subject Headings):
>
> dc:subject "http://lcsh.info/95000541#concept" (a URI)
>
> Picture forty Web sites with resources about the subject "World
> Wide Web". It is easier and more accurate for an application
> to index those resources together on the basis of a URI for the
> concept "World Wide Web" than it is to index them on the
> basis of character strings such as "World Wide Web" (which
> may be present in variants such as "World-Wide Web"). [*]
The only problem that I have with URIs is when they don't resolve so that
we can get information about what it means. (and well, that one doesn't,
as lcsh.info had to shut down). Without resolution, URIs are less useful
than strings.
That being said, I'm personally against strings as I've just run into so
many polysemous terms. As there's a push in our field to make our data
more findable to other disciplines, we run into the problem where someone
sees the term, and assumes it represents the same concept used in their
field. A few years back, when I was new to the efforts, I may have been
single-handedly responsible for holding up a breakout group to discuss
metadata registries for virtual observatories because they were insisting
that we register every "data product" and in solar physics, that's a file.
I have tens of millions of "data products". But to everyone else,
"data product" is a collection of items with processing, and they only had
maybe a thousand between them all.
In some cases, I'm still trying to get people to define the objects that
we're supposed to be cataloging, so I can figure out what attributes we
need to be tracking. (What is an 'observation'? Or an 'instrument'?)
-----
Joe Hourcle
Principal Software Engineer
Solar Data Analysis Center
Goddard Space Flight Center
|