On Wed, May 29, 2002 at 06:51:44AM +0100, Pete Johnston wrote:
> >http://www.gmd.de/People/Thomas.Baker/usage/terms/dc/) and the
> >RDF schema that Roland is working on at
> >http://www.mathematik.uni-
> osnabrueck.de/projects/dcqual/qual21.3.1/Schema/dctermsA.
> >
> >Specifically, I'm wondering whether there are significant
> >differences in the scope or content of the information the two
> >representations convey. If yes, need that be the case?
>
> I think I'd argue that if these are both "representations" of the complete
> set of information about the DCMI terms, then their scope and content
> should (must?) be the same; it is only the "form" which differs.
I almost agree...
Ideally, one representation should be a subset of the other.
For example, the term entries of an RDF schema might not need to
link back to (or declare themselves to be) uniquely defined
historical versions of a term, link back to uniquely identified
"decisions", declare that a particular term is an "Element
Refinement", or declare the status of a term as "Recommended" or
"Conformant", and the RDF schema would not necessarily need to
include all superseded versions of its terms (see my previous
messages comparing the RDF schema with the text document). On
the other hand, I see no reason _in principle_ why the RDF
schema could not contain all of the above.
Problems arise if one is not just a subset of the other but the
two express overlapping sets of attributes because in that case,
the one cannot be derived automatically from the other. For
example, if the RDF schema were to declare constructs of the
type "RangeOf:dcterms:foobar" in a way that goes beyond
information contained in the flat text representation, requiring
manual invention to maintain the RDF schema every time a change
is made to the attributes of a term.
> > Over
> >the long run it would be madness to try keeping two separate
> >representations in synch.
>
> Agreed.
I believe this is a key point.
> > If no, then could one (eg, the RDF
> >schema) perhaps be generated from the other (eg, the flat text
> >documentation) -- or vice versa?
>
> My understanding of the long-term plan for a "Vocabulary Management System"
> was that all of the information content about the terms - including all the
> versioning data which you've done a fantastic job in collating! - would be
> maintained in a database (I had assumed that meant a relational database
> but I guess it doesn't have to be!) and then various representations would
> be exposed/exported from that database.
That is my understanding too. FWIW, before Tokyo I had always
assumed and hoped that the RDF schema would be the canonical
representation from which everything else derives.
And now, after working with these flat text files for awhile and
finding them _very_ easy to maintain (once set up) -- you edit
them with a plain text editor, everyone can read them, I can
paste them into email messages, etc -- I am now wondering
whether the flat text files could be the upstream origin of the
entire work-flow.
In other words, I would maintain the text files and generate a
few basic Web pages. Whenever these files were changed, a Perl
script would generate an RDF schema. Then the Registry would
infuse either the text file or the RDF schema in order to
provide certain searching capabilities, such as term definitions
across languages, terms matching certain strings, etc.
As far as I can tell, the pages I now maintain capture
everything that needs to be captured about what we have.
It is not clear to me what additional information a
VMS database would carry. According to Section 2.4 of
http://dublincore.org/usage/documents/2002/02/13/vocabulary-guidelines/,
for example, the term entries for approved Encoding Schemes
submitted by outsiders would include information above
and beyond that captured in the current term declarations.
However, this additional information could in principle be
captured in the flat text documents as well. Specifically:
| Full name of the scheme - "Label"
| Suggested abbreviated name (acronym) - "Name"
| Domain(s) and extent of usage - "Comment"??
| Additional information about the scheme - "Comment"??
| Associated element(s) or element qualifier(s) - "Terms Qlfd"
| Maintenance agency - new field
| Maintenance agency contact person - new field
| Maintenance agency contact email address - new field
| Submitter email address - new field
| Online access point (URL if applicable) - new field??
| Access information (URL or physical address) - new field??
> Those representations would vary by syntactic form (plain text, HTML,
> XHTML, RDF/XML, RDF in N3 etc etc etc), but they would also vary by
> scope/content - subsets of terms, or subsets of the "full" information on
> the full set of terms, and so on, depending on their intended use/audience.
Exactly -- as long as we are talking about subsets, and as long as
the subsets are somehow derived automatically from the superset.
> I think (but I wasn't completely sure!) you are suggesting that a (set of?)
> text file(s) could fulfil that role for the "database"? I'm sure that's
I wasn't suggesting it before but I am now...!
> true - I guess it comes down to availability of the tools for managing the
> content, maintaining integrity of relationships etc.
Personally, I have a very strong bias for any application that
interfaces to my favorite text editor. RDF fits the bill in
principle, but I find the content _alot_ easier to debug without
angle brackets cluttering my screen.
Tom
--
Dr. Thomas Baker [log in to unmask]
Institutszentrum Schloss Birlinghoven mobile +49-171-408-5784
Fraunhofer-Gesellschaft work +49-30-8109-9027
53754 Sankt Augustin, Germany fax +49-2241-14-2619
|