JISCMail - CETIS-METADATA Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CETIS-METADATA Archives

CETIS-METADATA@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CETIS-METADATA Home
		CETIS-METADATA May 2005
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Do they mean metadata?!
From:
Steve Richardson <[log in to unmask]>
Reply-To:
Steve Richardson <[log in to unmask]>
Date:
Mon, 9 May 2005 12:49:08 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (512 lines)
Thank you George and Scott, its good to know people are willing and able
to engage in this discussion and yes it IS fun!! :-) deep breath...

I fear you have taken my initial assertion and exaggerated it beyond
hyperbole with preconcieved notions of how folksonomies and a dynamic
community driven system would operate and I disagree with many of your
assumptions. I also AGREE with a lot of what you say and think that we
actually converge on a lot of similar conclusions - that static systems
are extremely difficult to develop and maintain is one backed up by the
continuing saga of the JISC IE - has this happened yet? Bear in mind
that the code to make these static systems work from a computer science
perspective is almost trivial (its already written and available for
download) and I believe that it is the librarians that are holding this
process up by wanting to define the ultimate calssification system - and
it is this I am specifically arguing against. (yes - more controversy -
but please read on!!)

I do not for one moment think that the chaotic system you describe would
be useful at all.

I do not believe in a purely folksonomic approach; if you re-read my
original mail I hope you will see that I merely lauded the ideal of
folksonomie and proposed that rather than librarians and information
technologists agonising over which category definitions to use and then
telling the rest of the world what they have decided is best for us,
that they harness the power of the masses and find innovative and
creative ways of distilling the information into a useful taxonomy
determined by consensus. Librarians are also part of our community - are
not exempt and aloof from it - and as such are perfectly placed to take
key roles in the contribution, distillation and management of this data.

I know there are an awful lot of people in the world that would consider
the 'science' behind language theory and how we come to have meaning
something considerably more than 'sentiment', and I would add weight to
the post-structuralist aims and objectives of a folksonomic approach to
taxonomy evolution by recomending a journey of discovery of the well
known critical theorists; Derrida, Barthes (I would particularly
recommend Barthes S/Z as a good introduction - even if you just read a
synopsis on the web), Cixous, Lacan, Kristeva, Deleuze, Guattari,
Heidegger, Bloom, Foucault, Sartre, Bakhtin, Sontag to name but a few...
theres a lot and they all make compelling reading (writing?!) why are so
many of them French?? I want to cross words out now! lol!

Two, that I find youre response, George, to be reflective of a
fundamental lack of understanding of what computers are really capable
of (sorry - couldnt resist - :-D well,, you started it :-P ) again I can
only recommend another journey of discovery aiming towards understanding
some of the basic data structures (trees in particular), recursion (also
try Godel, Esher, Bach: An eternal Golden Braid - for all its failings
it is still an incredible illustration of the possibilities of
mathematics, physics, AI etc... Paradox is a powerful ally - fighting it
is futile!!) and perhaps equally importantly the extremely elegant and
flexible solutions to common and recurring engineering problems
presented by the 'Gang of Four' - Gamma Helm Johnson and Vlissides - in
their wonderful book Design Patterns: Abstraction and Reuse of
Object-Oriented Design (1993)... ;-)

Electronicly stored information is FUNDAMENTALLY different from physical
information; there is a whole plethora of things you can do with
electronic data that are virtually impossible with physical data - and
herein lies a telling tale of how librarians and computer scientists
have very different world views - I think we have a lot to learn from
each other!!

One very simple example of how computers can address the failings of
human interactions is demonstrated by google if you mis-spell your
search term - I seriously doubt that the suggestions given in 'did you
mean:' were hardcoded and manually mapped in a task of gargantuan
proportions by a team of people at google! Neither do I think that even
if they did do this that they would ever achieve, and certainly never be
able to maintain, the degree of precision required to be as useful as
the far from perfect but more than 'good enough' solution they have now!

The point I wish to defend is  that my observations are

quite reductionist and reveal[s] a common lack of
understanding regarding the complexities of subject mappings

I spent a long time reading all the protocols, understanding the way
they all work together - and designed and wrote code that clearly
demonstrated these ideas in operation - and yes, they are quite complex;
I just came to the conclusion that you will always fail if you try to
'hardcode' the relationships/subject mappings and that I believe in the
possibility of a solution that is flexible and dynamic and that it will
ultimately provide much more relevant and useful results even if they
are not precise - precision in this field is on the whole an
unattainable ideal - I cant help but think of The Panther by Rilke. Let
the panther out! Yes its dangerous - yes you lose (the restrictive and
limiting form of) control - but then anything becomes possible - welcome
to the real world - what are you afraid of!

As an aside, other ideas and books that have also helped develop my
current understanding of how we come to have meaning, how that meaning
is shared between ourselves and can be harnessed to provide a dynamic
framework within which we can share and most importantly find what were
looking for include,

A basic understanding of memes and the current trend of tribal metaphors
to develop collaborative systems - the classic basis for this line of
thought is the infamous 'prisoners dilema', yep economists and social
scientists are well in on the game too, I heard on the grapevine that
some rather interesting work is being conducted with a tribal version of
bittorrent to overcome the inherently selfish nature of people and
informatin sharing,, lewis carroll - alice in wonderland and through the
looking glass, James Joyce - both Ulysses and even more so Finnegans
Wake, the work of Tim Berners-Lee.

To get a little more hip, mobile phone txt msgs, chat rooms, IM - yep,
even the 'youth of today' are contributing to the development of
language and laguage theory (I can hear the screams of horror now) but
again, lol, c u l8r, smilies and all the rest of it are extremely common
modes of communication, within an shockingly short space of time
abbreviations like lotr and hhgttg become ubiquitous and are as near
perfectly unambiguous as you are likely to get within a chaotic system
(dont believe me? type them into google see what comes up! compare that
with any ONE of the controlled vocabulary terms and I challenge you to
show an equaly relevant and accurate response!!) - are you going to
include very real and pervasive classifications like this in your formal
definitions?

I will close by saying that I also agree that controlled vocabularies
will be around for a long time too (my original statement was
deliberately provocative) but they will for the most part remain in the
specialist field, having said that there is absolutely no problem
whatsover including as many systems of categorisation as you like into
the system I envisage, in fact, theres room for everyone, they will all
be able to be cross mapped and be able to happily, even if paradoxically
and chaotically, exist together in perfect harmony.

Wonderful - cheers guys!! Keep it coming!!

Steve






George Macgregor wrote:

>>This is a fun one! A few brief comments below...
>>
>>
>
>No problem.  Sarah Currier informed me that people enjoy controversy on this
>list!  ;-)  Unfortunately, I don't agree with much that you said, but here
>are some brief(ish) comments regarding some of your comments:
>
>
>
>>It depends on your expectation of results, and tolerance for ambiguity.
>>I think most general users are quite happy to live with clashing tags,
>>ambiguous tags, and so on, as long as there are sufficient hits to sift
>>by eyeballing. I think this holds for LOs too.
>>
>>
>
>Why should users be 'quite happy' this poor precision?  That's quite a
>defeatist attitude.  If I conduct a search, I expect (or at least hope) that
>I will experience decent recall, but ultimately good precision - the aim of
>any good information retrieval system.  Naturally, Web search engines based
>on post-coordinate indexing have been bereft of this latter concept.  If I
>want to discover material (via Google, say) written by Adam Smith, I will
>retrieve, not only information written by Adam Smith, but all information
>about the history of Adam Smith, his role in the Scottish Enlightenment, and
>other information.  Thus, the lack of metadata makes precise searching
>difficult.  Numerous user information seeking behaviour studies have proven
>that current users will not be content with sifting through pages of results
>in the vain hope of finding something relevant to their original search
>query (in fact, they often abandon their search task if their query isn't
>satisfied within the first page of results!).  Given the weighty metadata
>standards used, eyeballing shouldn't be necessary for LOs or even considered
>sufficient.
>
>
>
>>Popularity and the 'wisdom of crowds'. If 1914332  records are tagged
>>with "theology" and 1 with "theolojy", I'm going to assume the latter
>>is a typo. Also, subject are by nature pretty nebulous and paradoxical
>>concepts (see, for example, the tortuous attempts of Foucault to
>>describe the nature of "subject" in the Archaeology of Knowledge).
>>
>>
>
>Yes, it is tortuous, but that's why controlled vocabularies exist!  A
>specialist team agonise over the best way to describe a single concept so
>that we don't have to agonise over the best way to describe an information
>entity.  'Yes', subjects can be nebulous, but there are rules and
>conventions in controlled subject schemes (that have evolved over hundreds
>of years) to ensure the practitioner accurately characterises the nature of
>an information resource.  If, by your own admission, subjects are nebulous,
>how can the 'wisdom of crowds' approach possibly prevail? Answer: it can't.
>An independent arbiter is required (i.e. the controlled subject scheme).  At
>least there exists elaborate rules and conventions to improve retrieval
>relevance and consistency.
>
>The approach you suggest would lead to scenarios whereby homonymy callously
>rules the information society! Over time users wouldn't be able to find
>anything.  Resources would be tagged with subject headings that are far too
>broad to support precise retrieval (at a time when greater precision is
>sorely required).  The 'taggers' wouldn't know whether their resource
>relates to 'Theology' (which would constitute a very broad heading) or
>'Religion' (which would also constitute a very broad heading about the same
>subject).  In addition, resources wouldn't necessarily be accurately
>characterised.  Is the wisdom of the crowd likely to note that their
>resource meets the following (very simple) citation order?
>
>'Religion > Philosophy & theory of religion > Theodicy'
>
>Without specialist training, probably not.  They would likely just tag it
>with 'Religion' or 'Theology' or 'Spirituality' or 'Theolojy' and on and on,
>which would unquestionably be too broad. They would also not use qualifiers
>to assist others in determining whether a resource tagged with 'Boxers' is
>about the sport or about dogs?  Is this really adequate, or even sufficient?
>
>
>
>>In the uncontrolled environment, social factors come into play. How do
>>we *really* discover academic resources today? By reputation, by
>>recommendation, by references in known works. I think the assumption
>>you're making is that the user is acting in a very isolated way, which
>>I don't think is the case. On the contrary, the folksonomy approach
>>assumes community. This may itself be a restriction upon the domains
>>where it is an effective tool, and I think is one of the research
>>topics that would be of interest.
>>
>>
>
>In many ways, the user has never been as isolated as he/she is today.  The
>pedagogical paradigm shift to greater problem based learning and
>constructivist learning has witnessed a proliferation of students directing
>their very own learning experience, often within elaborate ICT architecture
>or VLEs (as well you know).  Obviously, social factors are important, and
>have always been.  The automatic response of most users experiencing an
>information gap is, after all, to ask a friend.  The folksonomy approach may
>assume community (which is certainly desirable), but it isn't likely to fill
>any information gaps or get information users from A to B (unless one is
>confined to a strict community of practice where the knowledge collection is
>extremely shallow).  This is, as you say, an interesting area of research,
>but I sincerely doubt whether the entire intellectual output and knowledge
>of the entire world could be adequately characterised or harnessed by a
>wisdom of crowds approach.
>
>An important thing to remember is that once upon a time, when all
>information assumed a physical form, practitioners assigned folksonomy-style
>terms to their resources.  However, it soon became clear that the library
>was thrown into disarray and that a suitable methodology had to be developed
>to facilitate resource discovery.  The electronic environment is no
>different. It's all information, just in a different format. Many LIS
>practices have evolved over hundreds of years and were developed for good
>reason.  I often feel that some of us try to 'reinvent the wheel' and ignore
>important lessons from the not-to-distant past, that's all.
>
>Cheers,
>
>George
>
>----------------------------------------------
>George Macgregor,
>Centre for Digital Library Research (CDLR),
>Department of Computer & Information Sciences,
>University of Strathclyde, Livingstone Tower,
>26 Richmond Street, Glasgow, UK, G1 1XH
>tel: +44 (0)141 548 4753
>web: http://cdlr.strath.ac.uk/
>--------------------------------------------
>
>
>>-----Original Message-----
>>From: Scott Wilson [mailto:[log in to unmask]]
>>Sent: 08 May 2005 04:36
>>To: George Macgregor
>>Cc: [log in to unmask]
>>Subject: Re: Do they mean metadata?!
>>
>>This is a fun one! A few brief comments below...
>>
>>On 6 May 2005, at 23:18, George Macgregor wrote:
>>
>>
>>
>>>>One thing is certain - the days of mandated taxonomy, static systems
>>>>and
>>>>controlled vocabularies (in the strictest sense) are numbered!
>>>>
>>>>
>>>This is an extremely curious statement for the CETIS-METADATA list and
>>>one
>>>that I find to be erroneous.
>>>
>>>
>>I think critically evaluating our dearly-held assumption is part of
>>what we do in the SIGs. I like statements like this (and not just about
>>metadata!!)
>>
>>
>>
>>>True, there are some merits to a folksonomy
>>>(esp. for browsing & serendipity) and more research should certainly be
>>>undertaken to ascertain their *relative* potential. But - and this is
>>>a big
>>>'but' - these benefits tend to reside within small pockets of practice
>>>(i.e.
>>>Del.icio.us and Flickr) and it remains difficult to envisage how such
>>>techniques can be applied out-with these contexts.
>>>
>>>
>>These "small pockets of practice" are larger by several orders of
>>magnitude than the 'small pockets of practice' that exist with, say,
>>LOM. in the big picture of the 'net, folksonomies are the 80,
>>controlled systems are the 20. (OK, its really more like no taxon 99%,
>>folksonomy 0.9%, controlled, 0.1%)
>>
>>
>>
>>>How, for instance, is such an approach expected to scale, particularly
>>>in
>>>those ubiquitous distributed systems involving users from more than one
>>>cultural context?  (It's worth noting that even within the UK the
>>>problem of
>>>regional cultural contexts is already problematic with respect to
>>>subject
>>>retrieval).  Closer to home, how is such a system to be usefully
>>>applied
>>>with the deposit of learning objects and the distributed searching of
>>>those
>>>learning object repositories?
>>>
>>>
>>It depends on your expectation of results, and tolerance for ambiguity.
>>I think most general users are quite happy to live with clashing tags,
>>ambiguous tags, and so on, as long as there are sufficient hits to sift
>>by eyeballing. I think this holds for LOs too.
>>
>>In practice, there are probably a few compelling controlled
>>vocabularies related to the 'official' areas of LOs, such as
>>relationship to quality standards and curriculum models, but for the
>>rest, folksonomy tagging is probably 'good enough'. Not perfect, not
>>completely accurate. But good enough.
>>
>>
>>
>>>Given the high probability of subject tagging ambiguity, the lack of
>>>synonym
>>>control, variant spellings, variant punctuation, name authority
>>>control, not
>>>
>>>
>>Popularity and the 'wisdom of crowds'. If 1914332  records are tagged
>>with "theology" and 1 with "theolojy", I'm going to assume the latter
>>is a typo. Also, subject are by nature pretty nebulous and paradoxical
>>concepts (see, for example, the tortuous attempts of Foucault to
>>describe the nature of "subject" in the Archaeology of Knowledge).
>>
>>
>>
>>>to mention the fact the majority of users suffer from the Belkin's
>>>infamous
>>>'Anomalies State of Knowledge' (and are therefore often incapable of
>>>formulating search queries, let alone assigning meaningful subjects
>>>descriptors), I find it highly questionable that such a "scheme" could
>>>ever
>>>be used effectively to support meaningful resource discovery and
>>>distributed
>>>searching.
>>>
>>>
>>And yet the vast majority of internet users somehow get by. How?
>>Reputation, word of mouth, trial and error, advertising,
>>contextualization, visual sifting, pure serendipity. The point about
>>folksonomy is that it isn't the definitive discovery mechanism; it
>>augments existing non-rigorous discovery approaches that are 'good
>>enough'.
>>
>>Meaning can emerge, as well as be imposed; classification can be (or
>>always is?) a political act, and folksonomy can be evangelised as a
>>democratization of knowledge classification.
>>
>>As my old mate Foucault postulated, "knowledge is power" may be true in
>>the same sense as "might equals right".
>>
>>
>>
>>> In addition, even if all of the above could be reconciled, there
>>>are still issues pertaining to the semantic relationships and the
>>>syntactic
>>>relationships of all the terms / subject captions used by the
>>>folksonomy to
>>>describe information entities. Would these important relationships be
>>>dispensed with?
>>>
>>>
>>The semantic relations of terms exist 'out there' as well as in formal
>>taxonomy, and can be inferred by proximity in discovery, and by
>>association to originators.
>>
>>How we construct meaning is by associating, dividing, and qualifying
>>categories of entities; in general I think human beings are pretty good
>>at this, even if librarians may disagree with their evaluations
>>sometimes :-)
>>
>>
>>
>>>Indeed, it remains to be seen how a system (underpinned by a
>>>folksonomy) could be effectively mined so as to increase IR precision,
>>>even
>>>by intelligent agents. Users would simply experience high recall or no
>>>results at all.  And, of course, it goes without saying that meaningful
>>>resource discovery or distributed searching would be an unviable
>>>proposition
>>>which, to my mind, is an unwelcome scenario when we should be 'thinking
>>>globally before acting locally' in the 21st century.
>>>
>>>
>>Tags are not the only source; there is also the content - tags just
>>provide hints. I don't imagine mining tags will be very productive, but
>>its amazing what Google manages with much less.
>>
>>
>>
>>>Further, the assumption that users have the necessary skills, the will
>>>or
>>>the infinite time required to engage with an ever expanding world of
>>>knowledge so that subject mappings (a 'pattern of relationships') can
>>>be
>>>created is - to my mind - quite reductionist and reveals a common lack
>>>of
>>>understanding regarding the complexities of subject mappings.
>>>Numerous
>>>research projects (funded by international organisations like OCLC or
>>>even
>>>the JISC) have found that creating exact match mappings between
>>>*controlled*
>>>subject headings is extremely complex, time consuming and resource
>>>intensive.  So, if it remains difficult within a controlled
>>>environment to
>>>create mappings with experienced information professionals, how
>>>feasible
>>>would it be in an uncontrolled environment?
>>>
>>>
>>In the uncontrolled environment, social factors come into play. How do
>>we *really* discover academic resources today? By reputation, by
>>recommendation, by references in known works. I think the assumption
>>you're making is that the user is acting in a very isolated way, which
>>I don't think is the case. On the contrary, the folksonomy approach
>>assumes community. This may itself be a restriction upon the domains
>>where it is an effective tool, and I think is one of the research
>>topics that would be of interest.
>>
>>I think the feedback mechanisms, and support processes that shape
>>folksonomies are also interesting; perhaps in their own way an
>>'internet time' model of the gradual shaping of knowledge
>>categorisation? Could tagging perhaps be an interesting method for the
>>generation of knowledge?
>>
>>
>>
>>>I agree with the sentiment that a taxonomy should not be 'kept'
>>>private and
>>>that 'a taxonomy should come from ourselves and our interactions with
>>>others', but current controlled vocabularies, classification schemes
>>>and
>>>taxonomies ARE largely derived from the people, albeit it in a more
>>>elaborate fashion.  Most prominent schemes are regularly revised and
>>>such
>>>revisions entail a detailed analysis of current knowledge and
>>>literature
>>>whereby appropriate terminology and concepts are harvested and
>>>inserted.  If
>>>one is fortunate enough to use a dynamic facility, such as OCLC's
>>>connexion,
>>>then one can expect revisions every minute of everyday.  These
>>>revisions
>>>will be internationally consistent, will use the best vocabulary to
>>>serve
>>>the most people, and will at least support resource discovery for the
>>>21st
>>>century.
>>>
>>>So, if you have been unable to recognise my position (!), folksonomies
>>>are
>>>interesting, but controlled vocabularies (in the strictest sense) will
>>>remain for many, many years!  I'm certainly confident of that, even if
>>>Steve
>>>Richardson isn't! ;-)
>>>
>>>
>>I'm sure they're always going to be around too, the question is how
>>widely they'll be used compared with less rigorous approaches. But I'd
>>still like to thank Steve for making such a provocative statement :-)
>>
>>- Scott
>>
>>
>>
>>>George
>>>----------------------------------------------
>>>George Macgregor,
>>>Centre for Digital Library Research (CDLR),
>>>Department of Computer & Information Sciences,
>>>University of Strathclyde, Livingstone Tower,
>>>26 Richmond Street, Glasgow, UK, G1 1XH
>>>tel: +44 (0)141 548 4753
>>>web: http://cdlr.strath.ac.uk/
>>>
>>>
>>>
>
>
>
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options