Martin Hamilton writes:
> The question is how one might go about writing the metadata for a whole
> WWW site in such a way as to minimize the number of network
> transactions required to grab it, and perhaps minimize the bandwidth
> used up in the grabbing.
This is a good idea, but I would amend what you are suggesting to be
specific to a collection rather than to a site. A site, or server, or
repository, is more of an administrative unit rather than a semantic
unit. A single site could provide several unrelated collections, and
often does. What one wants when grabbing metadata is typically *not*
everything at a site, unless you are particularly interested in the
site itself. Instead, one wants everything in some related set or
collection of resources.
(BTW, web crawlers will soon find it impractical to try grabbing
everything in the web, and will instead have to focus on selecting
related pages.)
> Embedding the metadata related to an object within the object itself
> effectively means that the object has to be snarfed in order to extract
> the info - unless some process extracts it and stashes it in a safe
> place, for the Web crawlers to find later.
This automatic extraction process should be supported by smart servers.
> What I don't recall us discussing was bundling all the metadata for
> (say) a WWW site into a single object, even if only at a protocol level
> ?
In addition to an object representing the bundling of all metadata for
everything in a collection, there might be a separate set of metadata
for the collection as a whole, and other metadata that is inherited by
every item in a collection (those are different things).
Daniel LaLiberte ([log in to unmask])
National Center for Supercomputing Applications
http://union.ncsa.uiuc.edu/~liberte/
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|