Hi
I am in the process of creating a module for the Drupal content
management system (CMS) to allow Dublin Core metadata to be maintained
and presented for all nodes (pages).
What I have been told in the past about putting DC metadata into HTML
document heads and what I see "in the wild" tend to differ, so I am
after some clarification so that I can get my database schema correct
from the word go.
I am having considerable trouble getting my head round this, so any help
would be much appreciated.
1) Initially, I was going to store the metadata as triples, being a
reference to the URI of the page being described, a predicate (scheme
plus term, eg: DC.subject) and a value. I now realise that this is
actually wrong, as there should also be a language value available as
well (null for where this simply is not applicable).
2) I notice that some elements are accompanied by a scheme property
(which, like lang, gets lost when delivered by the HTTP HEAD request).
I do not entirely understand the purpose of this - should this not be
implicit from the scheme/term as represented by, say, DC.type? If not,
should the dc.type name be used at all, and not the DCTERMS.DCMIType
that appears in the scheme?
3) Dates. I was, until told that I was doing it wrong, suffixing
DC.date as DC.date.created and DC.date.modified. I am now using
DCTERMS.created and DCTERMS.modified. Is it incorrect to use DC.date at
all, or only to use the suffixed terms. In other words, if I just had a
single date, would I call it DC.date, and if I wanted to elaborate on
this, then use the DCTERMS version?
4) I am making the system open-ended, so that new terms (such as
DC.accessibility) and new schemes can be added. To keep things simple,
especially for users new to metadata, what would be considered to be a
good set of terms with which to begin? I considered:
language, format, type, title, creator, identifier, rights, date,
description, subject. Enough for starters?
I have already done some fairly heavy testing, using my original triples
based schema. With a MySQL database, I created six million URIs, each
with nine pieces of metadata, and was able to return a URI from
searching the metadata in about 300 milliseconds. Not bad for laptop!
Performance is actually very important for me, as I am planning to use
almost the same database schema for another project (will be
open-sourced), where the number of URIs to be processed could almost
reach that of my tests. I want, therefore, to keep things as simple as
possible, but without losing the DC spirit. Point 2 is the one that has
me the most concerned in this respect. I really don't want to have to
start adding more fields, tables or complications than I absolutely have to.
Any hints or clues (you may already have guessed that I haven't got
one!) would be most appreciated.
Cheers
M
--
Matthew Smith
IT Consultancy & Web Application Development
Business: http://www.kbc.net.au/
Personal: http://www.smiffysplace.com/
LinkedIn: http://www.linkedin.com/in/smiffy
|