Rachel said:
> Returning from ECDL in Rome I have just got to this flurry of
> mail... I think we are all agreed there is 'additional' data
> required to make the DCMI Registry effective. The question is
> whether this data would best be declared in separate schemas
> to ensure 'unadulterated' DCMI schemas describing the terms
> in the various DCMI namespaces?
I think there was a question which came before this one, which was what
Eric, Roland, Tom, Harry and I were circling around last week, and that
was whether all of this data should be made available to the registry in
RDF-based forms or whether if it was specific to the registry
application it might be provided in some other (unspecified) form(s)
(proprietary API?, secondary XML documents?)
The two classes of data which have been mentioned, as far as I can
recall, are:
- (in an off-list discussion) the "DCMI-specific" classes suggested for
typing DCMI terms as
elements/refinements/schemes/controlled-vocab-terms. I don't think there
was a resolution to the question of whether such additional typing was
necessary or desirable?
- "administrative" metadata about the DCMI term descriptions, some of
which, as Tom noted last week, may be (primarily at least) for DCMI
internal use. That's an interesting example: I think that data _could_
be made available in RDF-based forms (and indeed some of it may well be
useful to other applications, like non-DCMI-owned registries), but I'm
not sure whether that means it _should_ be?
> From a wider perspective it concerns me that 'putting data in
> separate files' should be thought necessary. Surely the
> Semantic Web cannot rely on people arbitrarily dividing their
> schemas into the 'correct' semantic chunks? Surely SW
> applications need to be able to use data they understand and
> ignore the rest? I cannot see what basis one can put real
> boundaries on a 'schema file'.... in effect all triples are
> part of one great big schema? It seems to me very arbitrary
> as to where one draws these boundaries.
If any/all of this "additional" data _is_ to be made available in
RDF-based forms, then there is this second queston of how/whether to
partition it across physical files/instances/schemas. I seem to recall
there were/are (broadly) two points of view:
- that it was useful to physically separate out sets of "general"
statements made using widely understood properties and classes from more
"local", application-specific metadata, on the grounds (I think?) that
an external application would, at least in the first instance, be
presented with metadata it could reliably "grok" (to borrow Eric's
term);
- that it was preferable to expose/present all the metadata in one
instance/schema, and leave it to the application to manage/filter.
I agree with Rachel's point that to some extent such partitioning seems
arbitrary in the context of the larger Semantic Web, where of necessity
applications will work on "partial understanding". But it might still be
a useful practice to partition the metadata in some way?
I can see the appeal of presenting a basic, commonly understood subset
of the larger set of metadata about the terms in the first instance, and
including in that subset pointers to the extended, application-specific
metadata. I think separate files/instances/schemas _could_ be managed
effectively if that is useful/required (especially if the content is
being generated from a single source "management system").
As I think Eric suggested (apologies, I can't locate the message), how
to "chunk up" the metadata in instances/files comes down to a question
of "(Semantic Web) Good Practice", and I think I'd really appreciate the
guidance of those who are familiar with such practice "on the ground" in
other contexts.
Pete
|