This is something which wasn't discussed at the workshop, as far as I
recall, but it seems to be relevant once we start to think about the
uses which that metadata might be put.
The question is how one might go about writing the metadata for a whole
WWW site in such a way as to minimize the number of network
transactions required to grab it, and perhaps minimize the bandwidth
used up in the grabbing.
Embedding the metadata related to an object within the object itself
effectively means that the object has to be snarfed in order to extract
the info - unless some process extracts it and stashes it in a safe
place, for the Web crawlers to find later.
Separate metadata (whether extracted from the original object,
generated by hand, or...) implies separate network transactions to
retrieve the parcel of metadata associated with each object.
What I don't recall us discussing was bundling all the metadata for
(say) a WWW site into a single object, even if only at a protocol level
?
This is the approach adopted by the likes of ALIWEB and Harvest, using
IAFA templates and SOIF respectively. Harvest actually has a neat
little protocol of its own which makes it possible to get just the
stuff you're interested in, with compression thrown in for good measure:
HELLO <hostname> - Friendly Greeting
HELP - This message
SEND-OBJECT <oid> - Send an Object Description
SEND-UPDATE <timestamp> - Send all Object Descriptions that
have been changed/created since timestamp
SET compression - Enable GNU zip compressed transfers
QUIT - Close session
This might not seem relevant to our discussions, but now that the
Harvest software has been re-badged as the Netscape Catalog Server [1]
this will presumably generate a great deal of interest in the SOIF
format and the Harvest protocols. One might even go so far as to
suggest that SOIF be "fixed up" (if you believe this is necessary)
rather than a whole new metadata format developed... ;-)
<ducks>
Martin
[1] <URL:http://home.netscape.com/newsref/pr/newsrelease97.html>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|