Hi,
we're currently working on extracing data from HTML based on RFC2731. The
main issues we encountered so far are:
- lack of test documents
- when processing the HEAD part, is ordering of LINK and META tags relevant?
Is it ok if the link element mapping the schema prefix to a URI occurs
*after* the META tag using it? What is the scope? Is it allowed to have
multiple link elements that map the same prefix to different URIs?
- uppercase/lowercase: similar to WebDAV (RFC2518), our system identifies
properties based on a namespace name (URI reference) and a local name (XML
element name). In WebDAV, case *is* relevant. What's the convention for DCMI
properties?
- assuming that I'd want to use RFC2731-style encoding to map *arbitrary*
properties into my own (case-sensitive) property schema -- is there any
reliable way to find out whether for a given scheme (such as
"http://purl.org/DC/elements/1.0/") case is relevant or not?
- versioning: do "http://purl.org/DC/elements/1.0/" (RFC2731),
"http://purl.org/dc/elements/1.1/"
(<http://www.ukoln.ac.uk/metadata/dcmi/dcq-html/>) and "http://purl.org/dc"
(example usage in the W3C home page) identify the same property schemas?
Apologies if these things are FAQs. Pointers are welcome.
Regards, Julian
--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760
|