At 05:17 PM 7/5/97 +0000, Charles and Misha wrote:
>..
>1. In some cases the metadata relates to the body of the same HTML
>file, and in other cases the metadata is in an HTML file but refers
>to a separate PDF-format file. How should we indicate the difference
>between these to the search engine? The metadata for the PDF file
>would have the URL of the PDF file in the IDENTIFIER element.
>Perhaps the plain HMTL files should have the IDENTIFIER element
>omitted to indicate that the metadata refers to the file itself.
>In this case, we do not have to tell the robot the URL, as it is
>simply the URL of the page it is currently processing.
This sounds like a good convention to me. Happy to incorporate it in the
EdNA standard if there is support from other people. (Although for now most
of our content will be the web pages themselves)
Then again, if there is a web page that points to the PDF or whatever file,
mightnt this most likely have other useful contexturalising information and
mightnt it be good for people to go there first rather than going straight
from an entry in a metadata repository to the actual PDF file?
>2. In a similar vein, do the files all need to say that the FORMAT
>is text/html? The robot would not be reading them unless they were
>text/html anyway.
Will await other comment on this.
>3. And what about TITLE? Should the robot simply extract it from
>the <TITLE> ... </TITLE> part of the HTML? If not, should we insist
>that both titles should be the same?
No - if they are the same why bother - I presume any search engine /
indexer would use <title> if it didnt find a DC.TITLE
I think there are a variety of circumstances where more sophisticated users
will want to distinguish different titles for different purposes. What you
want displayed at the top of the page in your web site may be different
from what you want displayed in the (decontexturalised) context of a search
engine result. With the draft EdNA metadata standard
http://www.edna.edu.au/edna/owa/info.getpage?sp=&pagecode=5211#TITLE
we have gone one step further and provided a scheme name if people want to
distinguish the title they want displayed in EdNA from what they want
displayed in other DC-using contexts. I believe this does not create
unnecessary complexity in that it is all optional.
The edna robot's rule would be:
if there is a
<META NAME=DC.TITLE CONTENT="(SCHEME=EDNATITLE)blah"> us it
if not, if there is a
<META NAME=DC.TITLE CONTENT="blah"> use that
if not, use what ever in is
<title>blah</title>
Regards
Jack Gilding
end =============================================================
Jack Gilding ph: (03)9628-4652
Project Manager, VET EdNA Project fax: (03)9628-2472
Communications & Multimedia Unit [log in to unmask]
OTFE, PO Box 266D Melbourne VIC 3001 http://www.edna.edu.au/vetwp/
(level 4 Rialto Sth Tower 525 Collins Street Melbourne Australia)
|