Is there any data on the relative usefulness of structured information (i.e.
metadata) in document discovery, as opposed to boolean searches across
Google's indexing of what is available on the web? If there isn't then the
argument for one against the other is essentially sterile.
One thing I might throw into the pot is that Google caches only the first
100k of documents. Do we know that it indexes the contents beyond that 100k?
If not, then anything over 100k is not completely indexed. I can think of at
least one item in ERA which is 80mb.
Best,
Philip
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
----- Original Message -----
From: "Peter Cliff" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Wednesday, June 25, 2008 1:53 PM
Subject: Re: subject classification
> Hello!
>
> Stevan Harnad wrote:
>> Opposing hand-tagging. No view on usefulness of automated tagging (except
>> that I think boolean full-text search is in general far more powerful
>> than taxonomy search -- though of course any available taxonomy can
>> be covered by the boolean search). -- SH
>
> What if the full-text lacks certain keywords that are relevant for
> discovery/searching? I'm not convinced the content of a document is enough
> to accurately (and sustainably) describe what it is about. This email, for
> instance.
>
> That said, I'm not disagreeing that update of repository deposition has
> been slow - but I'm not sure we can blame that on just the need for
> metadata!
>
> Pete Cliff
> Research Officer, UKOLN
>
>
|