Hi!
I've followed this intriguing discussion for a while now, and thought
I'd give my view.
Several interesting points have been made:
1. The definition of metadata should distinguish between human-assigned
information (=metadata) and information inherent in the described object
(not metadata).
2. The definition needs to exclude non-digital/physical objects (washing
machines).
3. Metadata is just "useful search criteria"
4. The tin can analogy
I believe we are being too broad in one sense, and to narrow in one
sense. Regarding the first point above, Is it meaningful, and can you
distinguish between the two? Is the information about, say, the actors
in a film human-assigned or not? It is certainly not deductible. Is
"publication date" assigned or not?
The argument was made that only "strict metadata" was standardizable.
Why, then, does Dublin Core contain "non-metadata" elements such as
"format" or "language" and many more? It seems these are not only
possible to standardize, but also very useful. And when both "strict
metadata" and "attributes" are used in the same context, why
differentiate between the two? What do we gain by doing so?
Regarding point number 2, excluding non-digital/physical objects is
interesting. Books, regarded as conceptual entities, can obviously have
metadata, but can a physical book have "real metadata"? FRBR is an
interesting example, and shows clearly an example of a system that
processes information about both conceptual entities and their physical
manifestations including information about where they are located and so
on. So it seems from a practical point of view that the information is
of the same kind.
Look at imdb.com as another example. A film can have metadata, but how
come the information about the actors in the film is not metadata?
Making a distinction does not seem to lead to any net gain, as many
applications will process information of the two kinds anyway.
And again, can we differentiate between physical and non-physical
things? What is an "event"? Or a "competency", etc.
And by the way, if books can have metadata, why can't washing machines?
Point 3 is interesting, because it tries to define metadata by its
usage. I think we agree that not all metadata is interesting in a search
context, and we surely do not want a subjective notion of metadata
(which I don't think was ever really suggested, BTW). But I do believe
this point us in the right direction. There is something about
processability of information (for searching etc) that is characteristic
to metadata.
This leads us to point 4, the tin can. The analogy is helpful in
explaining that metadata and the object it describes are distinct, and
that metadata gives you information about the object without needing to
access the object itself. I think the analogy fails in one important
aspect: tin can information is not machine-processable. (That is what
makes it just an analogy, I suppose).
I'm more and more inclined to use the following definition of metadata:
"Machine-processable information about resources"
I think this definition is meangingful and to the point.
* It excludes huge amounts of information. For example, pure text, video
etc are not metadata, as they do not refer to another resource *in a
machine processable way*. They may of course refer to persons, etc, in
an informal fashion, but that does not count as metadata.
* Pure information such as "640x480" is not metadata, but a
machine-processable association of that string with the property
"dimension" of an image, is metadata.
* It allows any kind of information, both extrinsic ("strict metadata")
and intrinsic ("attributes"), as long as they are made available for
machine-processing.
* The definition of resource is intentionally very broad, matching the
way we mix processing of different kinds of resources in our systems (as
in FRBR or imdb).
Machine-processability is really the key to metadata. It is actually
much less central to the definition of metadata what kinds of properties
we may want to use. Some will be standardized across domains (legal
metadata is maybe one such case), some will be community-specific, some
will be locally defined and some will necessarily apply to a single kind
of resource in a single context only. And we need for our metadata
standards to *support* this heterogeneity.
<off-topic>
The way to do that, in my opinion, is to produce a useful abstract
framework for metadata, and then let different communities fill that
framework with their kind of metadata. Consensus building on metadata
terms is a bottom-up process. I think this is where LOM currently fails:
it tries to do both a framework, a set of metadata attributes and an
basic application profile in one standard. That is untenable in the long
run. The three (framework, terms, application profile) needs to be
separate. Dublin Core gets this part right, incomplete as it may be.
</off-topic>
That is my current thinking on the issue... It will be interesting to
see what gaping holes and glaring oversights you will (necessarily) find :-)
/Mikael
--
Plus ça change, plus c'est la même chose
|