Hi.
I have an issue with regard to the existing guidelines on the
General.Keyword field.
I am of the opinion that Keyword field entries, in 99.999% of cases,
should be single words, and almost never grouped into phrases unless
they are *totally* ambiguous on their own. That is to say, the field
should contain keywords not keyphrases.
For example, CanCore 2.0 includes the example Keyword keyphrase
"Microsoft Excel tutorial".
Given several metadata records with keyphrase "Microsoft Excel tutorial"
and others with "Tutorial on Microsoft Excel" or "Excel Tutorial", or
"MS Excel Tutorial", a computer will *not* know these documents have
anything in common!
Those keyphrases are basically "black boxes" to the computer... if they
don't equate through a case insensitive string comparison, they are
basically unrelated as far as a computer is concerned.
If the first metadata record had the keywords "Microsoft", "Excel", and
"Tutorial", and the second had "Tutorial", "MS", "Excel", then the
computer would obviously know that they are in fact about related
topics.
To expand on this:
Two uses for metadata:
a) Systems perform searches on some keyword, which match against various
metadata fields.
b) Metadata is used by a system to construct a semantic model about the
object it describes.
Grouping keywords into keyphrases has little impact on the first use
above, but is quite detrimental in the second.
Let's assume you are a student taking some course through an LMS system,
and you click on the "Help" link while viewing some learning object.
The LMS knows what learning object you are viewing, so it uses the LOM
metadata associated with that object to construct a semantic context to
use as a basis for a "help" search.
In the keyphrase case, the LMS will simply run a search for the phrase
"Microsoft Excel tutorial", which will return only records which contain
that *entire* phrase, with those words in that specific order, so
documents titled "MS Excel Tutorial" or "Tutorial on Microsoft Excel"
would *not* be returned. In the case of the keyphrase being broken down
into separate keywords, the LMS will better understand the learners
context, and run a search for "Microsoft" OR "Excel" OR "Tutorial".
Documents matching any of those terms will then be returned, with
documents matching more (or all) of the terms scoring higher and being
returned first - before those which only match one or two of the
keywords. But importantly, you *will* get results from a document
titled "MS Excel Tutorial", or any other containing those terms in *any*
order, not just the order they had been in the phrase.
The LOM standard uses the example "Mona Lisa". This isn't a horrible
example, because "Mona" and "Lisa" on their own are just ordinary names
from a semantic perspective, as opposed to being the title of a painting
when combined together in a phrase - and it thus may make sense to
combine them into "Mona Lisa". But even in this case, breaking them
apart into separate keywords and searching for ["Mona" AND "Lisa"] would
still give good results. If there was a document entitled "A biography
of Mona and the entire Lisa family", you would even get much better
results with them separate.
Related to this, CanCore also gives several single word examples in
pluralized form: ie, "spreadsheets" and "budgets". For the same reasons
as above, I think it would also be better to use the singular, rather
than plural form, for all keywords.
This semantic ambiguity will only lead to further problems as metadata
consumers advance to more sophisticated uses for metadata than just
keyword searching.
--
Chris Hubick
mailto:[log in to unmask]
http://adlib.athabascau.ca/~hubick/
__
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
|