Hi Rich,
At 11:50 23-1-01 +0100, Valkenburg, Peter wrote:
>Hi Rich,
>Back at my last job I did a DC metadata implementation on
>an AltaVista index together with Henny Bekker (cc-ed in on
>this message). The search engine is run by the research
>and academic network in the Netherlands, SURFnet, and has
>several million pages indexed, of which (I think) less than
>one 1% with DC tags.
>Basically the config files were changed to incorporate DC
>tags and some JavaScript was put in the interface to provide
>a smooth front-end.
>See http://search.surfnet.nl; I suppose Henny can answer
>any questions in detail.
Yep... To be exact we did changed a couple of things to
AltaVista v2.3a and (running experimental) AltaVista v3.02.
We did add the mappings of Dublin-Core to the metatagmap.conf
file. entries like:
DC.Author dcauthor
DC.Contributor dccontributor
DC.Coverage dccoverage
We created a dedicated interface (using the AVSHE interface
language) for searching the index using metadata with the
'advanced interface' as a template. We also enhanced the
'metadata search interface' using a javascript to collect
metadata from the index. In the interface we pre-defined
some searches for e.g. 'Author' which is actually a search
for "author:<input> OR dccreator:<input>".
Using this interface on searches for "dccreator:<name>" would
result in a search for only the specific DC-header..
Ofcource it's possible to combine these fields with the
boolean operators supported in the 'advanced interface'..
For my test index with AltaVista v3.02 have a look at URL:
http://alpha.sec.nl:9000/cgi-bin/query?mss=en/meta&pg=q&what=web
E.g. have a search for "Full-text:caching AND DC.Creator:bekker"
(http://alpha.sec.nl:9000/cgi-bin/query?mss=en%2Fmeta&pg=aq&what=web&user=searchintranet&enc=iso88592&site=main&kl=XX&q=%22caching%22+AND+%28author%3A%22bekker%22+OR+dccreator%3A%22bekker%22%29&r=caching&o1=&a1=&q1=caching&o2=AND&a2=author&q2=bekker&o3=AND&a3=url&q3=&o4=AND&a4=&q4=&o5=AND&a5=&q5=&o6=AND&a6=&q6=&search=search&d0=&d1=)
This will bring you to a page on Web caching.. (A technique used
in the old days of the internet when bandwidth was scarce :-)
>Regards,
>
>--
> Peter Valkenburg
> Senior Consultant
> ePresence Solutions
> Planetenbaan 28, PO Box 1049, NL-3600 BA Maarssen
> Mail: [log in to unmask]
> Web: www.epresence.com
> Voice: +31 (0)346-562505
>
>
> > -----Original Message-----
> > From: Rich Wiggins [mailto:[log in to unmask]]
> > Sent: Monday, January 22, 2001 10:36 PM
> > To: [log in to unmask]
> > Subject: Dublin Core under AltaVista at a large university
> >
> > Please forgive me if any of the following is an FAQ. I followed
> > Dublin Core fairly closely until 1997 or so. I've done some
> > poking around among the list of Projects at Purl.org but
> > may have missed something on point.
> >
> > At Michigan State University we run AltaVista, starting back
> > in 1996. We have recently migrated to the AltaVista 3.0 product,
> > and noticed a feature that lets the site define its own META
> > tags for indexing. See:
> > http://search.msu.edu:9000/adminhelp/help/metatagindexing.html
> >
> > This could be really powerful.
> >
> > I am interested in using this facility to encourage metadata
> > at least on home pages if not entire sites on campus. I seek
> > advice of the Dublin Core-noscenti.
> >
> > First, is anyone doing this?
> >
Some of our customers (http://www.bvenet.nl/) are doing it using a
Web-site in which pages are generated using a database.. [A problem
with AVS v2.3a is that you can't selectively index URL's containing
a "?"-mark. With AVS v3.* it's possible to have multiple crawlers
(also called 'scooters') which can have seperate configurations..]
> > Second, how should DC tags co-exist with common, ad-hoc META tags?
> >
Certainly.. But not (yet..) as you would like to have it.. It has to
be solved in the userinterface.. We have don it using a javascript
to manipulate the input of the user..
> > We have added the core elements to the AltaVistsa config file.
> > Now I am wondering how to handle other commonly used, but
> > non-standards-based, META tags. One option is to map
> > common tags to DC ones. From the A/V doc:
> >
> > Note: Multiple META tag names can map to the same field name,
> > for example:
> > author author
> > creator author
> > DC.creator author
> >
At the moment we are using:
# Default metaheaders
Author author
Keywords keywords
Description description
(...)
# Dublin-core META headers
DC.Author dcauthor
DC.Contributor dccontributor
(...)
We are extending our metadata search interface (an the metadata.conf)
to support Qualified Dublin-Core..
> > As I read the doc, users would have to search on the "author" field
> > (as opposed to DC.creator), which would not be good for Dublin-Core-
> > savvy users. However I contend the number of those is vanishingly
> > small on our big campus.
> >
> > The advantage would be that for those pages on campus that DO have
> > the DC.creator field, or the author field, or the creator field,
> > a search for author:"Jon Havlicek" would match.
> >
> > I also expect that we might offer a specialized fielded search
> > interface with common terms for labels instead of DC names.
> >
Certainly... See above..
> > We could define separate fields for each, and offer via
> > that fielded search interface a way to search all forms
> > at once. That way the DC tags and the common tags have
> > their own index "buckets" in case that was ever useful.
> > Which way would you go?
> >
Solve it in the userinterface and keep the metadata seperated in the
index..
> > Third, how do we get page authors to populate pages with DC metadata?
> >
That's indeed a big problem..
> > It has always seemed to me that metadata would go nowhere until major
> > authoring tools at least allow standard metadata tags -- if not
> > demanding it.
> >
> > It doesn't appear to me that the situation has changed any. Is there
> > a SINGLE major Web authoring tool that out of the box encourages
> > users to enter DC metadata?
> >
I do know of a couple.. A Perl-script to create a HTML section containing
the metadata.. See URL: http://www.kb.nl/coop/donor/mg-start-nl.html
(unfortunately in Dutch).
I've also seen a program called "TagGen" (http://www.hisoftware.com/)
with an extention to support Dublin-Core..
> > Absent that, has anyone succeeded in making templates for FrontPage,
> > DreamWeaver, and the like, that could be adopted by users interested
> > seriously in doing metadata well?
> >
No.. Sorry... Didn't look at it..
> > We do have a major humanities research project that is looking at
> > automatically poking DC tags into at least their top-level pages.
> > I am rather pessimistic about getting anyone else to do it, and
> > would love to hear encouraging real-world examples.
> >
You could send an Email to "Manita Toetenel" <[log in to unmask]> which
is the Web manager of BVEnet. That organisation has incorporated DC
metadata in their Web-site..
> > Fourth, a question not necessarily related to Dublin Core. We are
> > exploring how to mark pages that are "official" in the search
> > index, allowing filtered searches on official status, and/or
> > heightened ranking and/or a hit list icon. Has anyone tackled
> > that? How?
> >
Sorry.. No. Unles you are using a 'secret' metaheader which is used
by the interface to filter the result pages...
Cheers, Henny
---------------------------------------------------------------------
E-Mail: [log in to unmask] Voice: +31 30 2305305 Fax: +31 30 2305329
Web: http://www.surfnet.nl/surfnet/persons/henny/ o
Paper: H.J. Bekker, SURFnet _ /- _
Po Box 19035, 3501 DA Utrecht Nederland (_) > (_)
----------------------------------------------------------------------
---------------------------------------------------------------------
E-Mail: [log in to unmask] Voice: +31 30 2305305 Fax: +31 30 2305329
Web: http://www.surfnet.nl/surfnet/persons/henny/ o
Paper: H.J. Bekker, SURFnet _ /- _
Po Box 19035, 3501 DA Utrecht Nederland (_) > (_)
----------------------------------------------------------------------
|