Frank Roos wrote...
>Actually, the *searcher* for a resource will give meaning to the DATE
>field. He decides, that he wants to discover a resource with a specific
>date. And as long as this date, whatever kind of date this is, is in the
>DC record, he will succeed in discovering it, and he will be satisfied.
True... at least in theory. But false in practice, unfortunately.
Including many dates in a resource simply increases the number of false
hits. If you ask for a map of Melbourne in the 1990s, for example, you
might get maps drawn in the 1880s but digitized in 1995.
In the worst case the search engine may return the first N hits and,
if the resource doesn't occur in this list, the user won't find it.
More likely, however, the search engine will return results a page at a
time. The user often has to wade through pages of irrelevant resources
to find the answer to their question. They frequently give up before
reaching it. The problem of low precision can be seen (in spades!) on
the net at the moment.
There is also an important psychological problem with returning many
apparently irrelevant resources. If the user can't understand *why* the
search engine chose that resource, a naive user usually judges the
search engine as poor and may cease to use it.
A major challange of resource discovery is to build models of 'typical
users'. These models should be able to answer the question 'the user
has asked this, what are they likely to mean?' The search engine could
then return some answers that are likely to be relevant together with
suggestions as to ways of refining the query if the answers aren't
relevant. This models the way I've observed reference librarians working
with customers.
This is very relevant to the discussion of 'DATE' and 'COVERAGE'.
In my view, users are most unlikely to use (or understand) 'date' as
'date resource was made available electronically'.
A typical query I want answered (wearing my hat as a techical historian)
is 'Find me books about railway signalling published between 1900 and
1920' ...because I know that these books would describe current
practice c1910. I don't care in the slightest that the book was scanned
in the 1990s. The same argument applies to maps and data sets.
andrew waugh
|