JISCMail - CETIS-METADATA Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
CETIS-METADATA Archives

CETIS-METADATA@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		CETIS-METADATA Home
		CETIS-METADATA May 2005
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Do they mean metadata?!
From:
George Macgregor <[log in to unmask]>
Reply-To:
George Macgregor <[log in to unmask]>
Date:
Mon, 9 May 2005 14:52:25 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (592 lines)
Steve,

The man doth protest too much, methinks! Sadly, I remain unconvinced and I
think some of my points have been totally missed too.  Sorry Steve!  We will
have to agree to disagree I'm afraid.  :-(  Please note, however, that I am
not entirely uniformed RE the bounds of computing or taxonomies, as you
allude.

> the system I envisage, in fact, theres room for everyone, they will all
> be able to be cross mapped and be able to happily, even if paradoxically
> and chaotically, exist together in perfect harmony.

The difficulties in cross mappings have been well documented at the CDLR
(see the High Level Thesaurus project based at CDLR:
http://hilt.cdlr.strath.ac.uk/) and so I look forward to seeing this
dynamically mapping system in operation (as will OCLC).

> what are you afraid of!

An information environment that's full of gobbledygook.

Sarah has a point though, who ARE these librarians holding up the JISC IE?

Cheers,

George

PS - You forgot to mention Adam Smith in your list of theorists that
contributed to our understanding of taxonomies. ;-)
----------------------------------------------
George Macgregor,
Centre for Digital Library Research (CDLR),
Department of Computer & Information Sciences,
University of Strathclyde, Livingstone Tower,
26 Richmond Street, Glasgow, UK, G1 1XH
tel: +44 (0)141 548 4753
web: http://cdlr.strath.ac.uk/
--------------------------------------------

> -----Original Message-----
> From: The CETIS Metadata Special Interest Group [mailto:CETIS-
> [log in to unmask]] On Behalf Of Steve Richardson
> Sent: 09 May 2005 12:49
> To: [log in to unmask]
> Subject: Re: Do they mean metadata?!
>
> Thank you George and Scott, its good to know people are willing and able
> to engage in this discussion and yes it IS fun!! :-) deep breath...
>
> I fear you have taken my initial assertion and exaggerated it beyond
> hyperbole with preconcieved notions of how folksonomies and a dynamic
> community driven system would operate and I disagree with many of your
> assumptions. I also AGREE with a lot of what you say and think that we
> actually converge on a lot of similar conclusions - that static systems
> are extremely difficult to develop and maintain is one backed up by the
> continuing saga of the JISC IE - has this happened yet? Bear in mind
> that the code to make these static systems work from a computer science
> perspective is almost trivial (its already written and available for
> download) and I believe that it is the librarians that are holding this
> process up by wanting to define the ultimate calssification system - and
> it is this I am specifically arguing against. (yes - more controversy -
> but please read on!!)
>
> I do not for one moment think that the chaotic system you describe would
> be useful at all.
>
> I do not believe in a purely folksonomic approach; if you re-read my
> original mail I hope you will see that I merely lauded the ideal of
> folksonomie and proposed that rather than librarians and information
> technologists agonising over which category definitions to use and then
> telling the rest of the world what they have decided is best for us,
> that they harness the power of the masses and find innovative and
> creative ways of distilling the information into a useful taxonomy
> determined by consensus. Librarians are also part of our community - are
> not exempt and aloof from it - and as such are perfectly placed to take
> key roles in the contribution, distillation and management of this data.
>
> I know there are an awful lot of people in the world that would consider
> the 'science' behind language theory and how we come to have meaning
> something considerably more than 'sentiment', and I would add weight to
> the post-structuralist aims and objectives of a folksonomic approach to
> taxonomy evolution by recomending a journey of discovery of the well
> known critical theorists; Derrida, Barthes (I would particularly
> recommend Barthes S/Z as a good introduction - even if you just read a
> synopsis on the web), Cixous, Lacan, Kristeva, Deleuze, Guattari,
> Heidegger, Bloom, Foucault, Sartre, Bakhtin, Sontag to name but a few...
> theres a lot and they all make compelling reading (writing?!) why are so
> many of them French?? I want to cross words out now! lol!
>
> Two, that I find youre response, George, to be reflective of a
> fundamental lack of understanding of what computers are really capable
> of (sorry - couldnt resist - :-D well,, you started it :-P ) again I can
> only recommend another journey of discovery aiming towards understanding
> some of the basic data structures (trees in particular), recursion (also
> try Godel, Esher, Bach: An eternal Golden Braid - for all its failings
> it is still an incredible illustration of the possibilities of
> mathematics, physics, AI etc... Paradox is a powerful ally - fighting it
> is futile!!) and perhaps equally importantly the extremely elegant and
> flexible solutions to common and recurring engineering problems
> presented by the 'Gang of Four' - Gamma Helm Johnson and Vlissides - in
> their wonderful book Design Patterns: Abstraction and Reuse of
> Object-Oriented Design (1993)... ;-)
>
> Electronicly stored information is FUNDAMENTALLY different from physical
> information; there is a whole plethora of things you can do with
> electronic data that are virtually impossible with physical data - and
> herein lies a telling tale of how librarians and computer scientists
> have very different world views - I think we have a lot to learn from
> each other!!
>
> One very simple example of how computers can address the failings of
> human interactions is demonstrated by google if you mis-spell your
> search term - I seriously doubt that the suggestions given in 'did you
> mean:' were hardcoded and manually mapped in a task of gargantuan
> proportions by a team of people at google! Neither do I think that even
> if they did do this that they would ever achieve, and certainly never be
> able to maintain, the degree of precision required to be as useful as
> the far from perfect but more than 'good enough' solution they have now!
>
> The point I wish to defend is  that my observations are
>
> quite reductionist and reveal[s] a common lack of
> understanding regarding the complexities of subject mappings
>
> I spent a long time reading all the protocols, understanding the way
> they all work together - and designed and wrote code that clearly
> demonstrated these ideas in operation - and yes, they are quite complex;
> I just came to the conclusion that you will always fail if you try to
> 'hardcode' the relationships/subject mappings and that I believe in the
> possibility of a solution that is flexible and dynamic and that it will
> ultimately provide much more relevant and useful results even if they
> are not precise - precision in this field is on the whole an
> unattainable ideal - I cant help but think of The Panther by Rilke. Let
> the panther out! Yes its dangerous - yes you lose (the restrictive and
> limiting form of) control - but then anything becomes possible - welcome
> to the real world - what are you afraid of!
>
> As an aside, other ideas and books that have also helped develop my
> current understanding of how we come to have meaning, how that meaning
> is shared between ourselves and can be harnessed to provide a dynamic
> framework within which we can share and most importantly find what were
> looking for include,
>
> A basic understanding of memes and the current trend of tribal metaphors
> to develop collaborative systems - the classic basis for this line of
> thought is the infamous 'prisoners dilema', yep economists and social
> scientists are well in on the game too, I heard on the grapevine that
> some rather interesting work is being conducted with a tribal version of
> bittorrent to overcome the inherently selfish nature of people and
> informatin sharing,, lewis carroll - alice in wonderland and through the
> looking glass, James Joyce - both Ulysses and even more so Finnegans
> Wake, the work of Tim Berners-Lee.
>
> To get a little more hip, mobile phone txt msgs, chat rooms, IM - yep,
> even the 'youth of today' are contributing to the development of
> language and laguage theory (I can hear the screams of horror now) but
> again, lol, c u l8r, smilies and all the rest of it are extremely common
> modes of communication, within an shockingly short space of time
> abbreviations like lotr and hhgttg become ubiquitous and are as near
> perfectly unambiguous as you are likely to get within a chaotic system
> (dont believe me? type them into google see what comes up! compare that
> with any ONE of the controlled vocabulary terms and I challenge you to
> show an equaly relevant and accurate response!!) - are you going to
> include very real and pervasive classifications like this in your formal
> definitions?
>
> I will close by saying that I also agree that controlled vocabularies
> will be around for a long time too (my original statement was
> deliberately provocative) but they will for the most part remain in the
> specialist field, having said that there is absolutely no problem
> whatsover including as many systems of categorisation as you like into
> the system I envisage, in fact, theres room for everyone, they will all
> be able to be cross mapped and be able to happily, even if paradoxically
> and chaotically, exist together in perfect harmony.
>
> Wonderful - cheers guys!! Keep it coming!!
>
> Steve
>
>
>
>
>
>
> George Macgregor wrote:
>
> >>This is a fun one! A few brief comments below...
> >>
> >>
> >
> >No problem.  Sarah Currier informed me that people enjoy controversy on
> this
> >list!  ;-)  Unfortunately, I don't agree with much that you said, but
> here
> >are some brief(ish) comments regarding some of your comments:
> >
> >
> >
> >>It depends on your expectation of results, and tolerance for ambiguity.
> >>I think most general users are quite happy to live with clashing tags,
> >>ambiguous tags, and so on, as long as there are sufficient hits to sift
> >>by eyeballing. I think this holds for LOs too.
> >>
> >>
> >
> >Why should users be 'quite happy' this poor precision?  That's quite a
> >defeatist attitude.  If I conduct a search, I expect (or at least hope)
> that
> >I will experience decent recall, but ultimately good precision - the aim
> of
> >any good information retrieval system.  Naturally, Web search engines
> based
> >on post-coordinate indexing have been bereft of this latter concept.  If
> I
> >want to discover material (via Google, say) written by Adam Smith, I will
> >retrieve, not only information written by Adam Smith, but all information
> >about the history of Adam Smith, his role in the Scottish Enlightenment,
> and
> >other information.  Thus, the lack of metadata makes precise searching
> >difficult.  Numerous user information seeking behaviour studies have
> proven
> >that current users will not be content with sifting through pages of
> results
> >in the vain hope of finding something relevant to their original search
> >query (in fact, they often abandon their search task if their query isn't
> >satisfied within the first page of results!).  Given the weighty metadata
> >standards used, eyeballing shouldn't be necessary for LOs or even
> considered
> >sufficient.
> >
> >
> >
> >>Popularity and the 'wisdom of crowds'. If 1914332  records are tagged
> >>with "theology" and 1 with "theolojy", I'm going to assume the latter
> >>is a typo. Also, subject are by nature pretty nebulous and paradoxical
> >>concepts (see, for example, the tortuous attempts of Foucault to
> >>describe the nature of "subject" in the Archaeology of Knowledge).
> >>
> >>
> >
> >Yes, it is tortuous, but that's why controlled vocabularies exist!  A
> >specialist team agonise over the best way to describe a single concept so
> >that we don't have to agonise over the best way to describe an
> information
> >entity.  'Yes', subjects can be nebulous, but there are rules and
> >conventions in controlled subject schemes (that have evolved over
> hundreds
> >of years) to ensure the practitioner accurately characterises the nature
> of
> >an information resource.  If, by your own admission, subjects are
> nebulous,
> >how can the 'wisdom of crowds' approach possibly prevail? Answer: it
> can't.
> >An independent arbiter is required (i.e. the controlled subject scheme).
> At
> >least there exists elaborate rules and conventions to improve retrieval
> >relevance and consistency.
> >
> >The approach you suggest would lead to scenarios whereby homonymy
> callously
> >rules the information society! Over time users wouldn't be able to find
> >anything.  Resources would be tagged with subject headings that are far
> too
> >broad to support precise retrieval (at a time when greater precision is
> >sorely required).  The 'taggers' wouldn't know whether their resource
> >relates to 'Theology' (which would constitute a very broad heading) or
> >'Religion' (which would also constitute a very broad heading about the
> same
> >subject).  In addition, resources wouldn't necessarily be accurately
> >characterised.  Is the wisdom of the crowd likely to note that their
> >resource meets the following (very simple) citation order?
> >
> >'Religion > Philosophy & theory of religion > Theodicy'
> >
> >Without specialist training, probably not.  They would likely just tag it
> >with 'Religion' or 'Theology' or 'Spirituality' or 'Theolojy' and on and
> on,
> >which would unquestionably be too broad. They would also not use
> qualifiers
> >to assist others in determining whether a resource tagged with 'Boxers'
> is
> >about the sport or about dogs?  Is this really adequate, or even
> sufficient?
> >
> >
> >
> >>In the uncontrolled environment, social factors come into play. How do
> >>we *really* discover academic resources today? By reputation, by
> >>recommendation, by references in known works. I think the assumption
> >>you're making is that the user is acting in a very isolated way, which
> >>I don't think is the case. On the contrary, the folksonomy approach
> >>assumes community. This may itself be a restriction upon the domains
> >>where it is an effective tool, and I think is one of the research
> >>topics that would be of interest.
> >>
> >>
> >
> >In many ways, the user has never been as isolated as he/she is today.
> The
> >pedagogical paradigm shift to greater problem based learning and
> >constructivist learning has witnessed a proliferation of students
> directing
> >their very own learning experience, often within elaborate ICT
> architecture
> >or VLEs (as well you know).  Obviously, social factors are important, and
> >have always been.  The automatic response of most users experiencing an
> >information gap is, after all, to ask a friend.  The folksonomy approach
> may
> >assume community (which is certainly desirable), but it isn't likely to
> fill
> >any information gaps or get information users from A to B (unless one is
> >confined to a strict community of practice where the knowledge collection
> is
> >extremely shallow).  This is, as you say, an interesting area of
> research,
> >but I sincerely doubt whether the entire intellectual output and
> knowledge
> >of the entire world could be adequately characterised or harnessed by a
> >wisdom of crowds approach.
> >
> >An important thing to remember is that once upon a time, when all
> >information assumed a physical form, practitioners assigned folksonomy-
> style
> >terms to their resources.  However, it soon became clear that the library
> >was thrown into disarray and that a suitable methodology had to be
> developed
> >to facilitate resource discovery.  The electronic environment is no
> >different. It's all information, just in a different format. Many LIS
> >practices have evolved over hundreds of years and were developed for good
> >reason.  I often feel that some of us try to 'reinvent the wheel' and
> ignore
> >important lessons from the not-to-distant past, that's all.
> >
> >Cheers,
> >
> >George
> >
> >----------------------------------------------
> >George Macgregor,
> >Centre for Digital Library Research (CDLR),
> >Department of Computer & Information Sciences,
> >University of Strathclyde, Livingstone Tower,
> >26 Richmond Street, Glasgow, UK, G1 1XH
> >tel: +44 (0)141 548 4753
> >web: http://cdlr.strath.ac.uk/
> >--------------------------------------------
> >
> >
> >>-----Original Message-----
> >>From: Scott Wilson [mailto:[log in to unmask]]
> >>Sent: 08 May 2005 04:36
> >>To: George Macgregor
> >>Cc: [log in to unmask]
> >>Subject: Re: Do they mean metadata?!
> >>
> >>This is a fun one! A few brief comments below...
> >>
> >>On 6 May 2005, at 23:18, George Macgregor wrote:
> >>
> >>
> >>
> >>>>One thing is certain - the days of mandated taxonomy, static systems
> >>>>and
> >>>>controlled vocabularies (in the strictest sense) are numbered!
> >>>>
> >>>>
> >>>This is an extremely curious statement for the CETIS-METADATA list and
> >>>one
> >>>that I find to be erroneous.
> >>>
> >>>
> >>I think critically evaluating our dearly-held assumption is part of
> >>what we do in the SIGs. I like statements like this (and not just about
> >>metadata!!)
> >>
> >>
> >>
> >>>True, there are some merits to a folksonomy
> >>>(esp. for browsing & serendipity) and more research should certainly be
> >>>undertaken to ascertain their *relative* potential. But - and this is
> >>>a big
> >>>'but' - these benefits tend to reside within small pockets of practice
> >>>(i.e.
> >>>Del.icio.us and Flickr) and it remains difficult to envisage how such
> >>>techniques can be applied out-with these contexts.
> >>>
> >>>
> >>These "small pockets of practice" are larger by several orders of
> >>magnitude than the 'small pockets of practice' that exist with, say,
> >>LOM. in the big picture of the 'net, folksonomies are the 80,
> >>controlled systems are the 20. (OK, its really more like no taxon 99%,
> >>folksonomy 0.9%, controlled, 0.1%)
> >>
> >>
> >>
> >>>How, for instance, is such an approach expected to scale, particularly
> >>>in
> >>>those ubiquitous distributed systems involving users from more than one
> >>>cultural context?  (It's worth noting that even within the UK the
> >>>problem of
> >>>regional cultural contexts is already problematic with respect to
> >>>subject
> >>>retrieval).  Closer to home, how is such a system to be usefully
> >>>applied
> >>>with the deposit of learning objects and the distributed searching of
> >>>those
> >>>learning object repositories?
> >>>
> >>>
> >>It depends on your expectation of results, and tolerance for ambiguity.
> >>I think most general users are quite happy to live with clashing tags,
> >>ambiguous tags, and so on, as long as there are sufficient hits to sift
> >>by eyeballing. I think this holds for LOs too.
> >>
> >>In practice, there are probably a few compelling controlled
> >>vocabularies related to the 'official' areas of LOs, such as
> >>relationship to quality standards and curriculum models, but for the
> >>rest, folksonomy tagging is probably 'good enough'. Not perfect, not
> >>completely accurate. But good enough.
> >>
> >>
> >>
> >>>Given the high probability of subject tagging ambiguity, the lack of
> >>>synonym
> >>>control, variant spellings, variant punctuation, name authority
> >>>control, not
> >>>
> >>>
> >>Popularity and the 'wisdom of crowds'. If 1914332  records are tagged
> >>with "theology" and 1 with "theolojy", I'm going to assume the latter
> >>is a typo. Also, subject are by nature pretty nebulous and paradoxical
> >>concepts (see, for example, the tortuous attempts of Foucault to
> >>describe the nature of "subject" in the Archaeology of Knowledge).
> >>
> >>
> >>
> >>>to mention the fact the majority of users suffer from the Belkin's
> >>>infamous
> >>>'Anomalies State of Knowledge' (and are therefore often incapable of
> >>>formulating search queries, let alone assigning meaningful subjects
> >>>descriptors), I find it highly questionable that such a "scheme" could
> >>>ever
> >>>be used effectively to support meaningful resource discovery and
> >>>distributed
> >>>searching.
> >>>
> >>>
> >>And yet the vast majority of internet users somehow get by. How?
> >>Reputation, word of mouth, trial and error, advertising,
> >>contextualization, visual sifting, pure serendipity. The point about
> >>folksonomy is that it isn't the definitive discovery mechanism; it
> >>augments existing non-rigorous discovery approaches that are 'good
> >>enough'.
> >>
> >>Meaning can emerge, as well as be imposed; classification can be (or
> >>always is?) a political act, and folksonomy can be evangelised as a
> >>democratization of knowledge classification.
> >>
> >>As my old mate Foucault postulated, "knowledge is power" may be true in
> >>the same sense as "might equals right".
> >>
> >>
> >>
> >>> In addition, even if all of the above could be reconciled, there
> >>>are still issues pertaining to the semantic relationships and the
> >>>syntactic
> >>>relationships of all the terms / subject captions used by the
> >>>folksonomy to
> >>>describe information entities. Would these important relationships be
> >>>dispensed with?
> >>>
> >>>
> >>The semantic relations of terms exist 'out there' as well as in formal
> >>taxonomy, and can be inferred by proximity in discovery, and by
> >>association to originators.
> >>
> >>How we construct meaning is by associating, dividing, and qualifying
> >>categories of entities; in general I think human beings are pretty good
> >>at this, even if librarians may disagree with their evaluations
> >>sometimes :-)
> >>
> >>
> >>
> >>>Indeed, it remains to be seen how a system (underpinned by a
> >>>folksonomy) could be effectively mined so as to increase IR precision,
> >>>even
> >>>by intelligent agents. Users would simply experience high recall or no
> >>>results at all.  And, of course, it goes without saying that meaningful
> >>>resource discovery or distributed searching would be an unviable
> >>>proposition
> >>>which, to my mind, is an unwelcome scenario when we should be 'thinking
> >>>globally before acting locally' in the 21st century.
> >>>
> >>>
> >>Tags are not the only source; there is also the content - tags just
> >>provide hints. I don't imagine mining tags will be very productive, but
> >>its amazing what Google manages with much less.
> >>
> >>
> >>
> >>>Further, the assumption that users have the necessary skills, the will
> >>>or
> >>>the infinite time required to engage with an ever expanding world of
> >>>knowledge so that subject mappings (a 'pattern of relationships') can
> >>>be
> >>>created is - to my mind - quite reductionist and reveals a common lack
> >>>of
> >>>understanding regarding the complexities of subject mappings.
> >>>Numerous
> >>>research projects (funded by international organisations like OCLC or
> >>>even
> >>>the JISC) have found that creating exact match mappings between
> >>>*controlled*
> >>>subject headings is extremely complex, time consuming and resource
> >>>intensive.  So, if it remains difficult within a controlled
> >>>environment to
> >>>create mappings with experienced information professionals, how
> >>>feasible
> >>>would it be in an uncontrolled environment?
> >>>
> >>>
> >>In the uncontrolled environment, social factors come into play. How do
> >>we *really* discover academic resources today? By reputation, by
> >>recommendation, by references in known works. I think the assumption
> >>you're making is that the user is acting in a very isolated way, which
> >>I don't think is the case. On the contrary, the folksonomy approach
> >>assumes community. This may itself be a restriction upon the domains
> >>where it is an effective tool, and I think is one of the research
> >>topics that would be of interest.
> >>
> >>I think the feedback mechanisms, and support processes that shape
> >>folksonomies are also interesting; perhaps in their own way an
> >>'internet time' model of the gradual shaping of knowledge
> >>categorisation? Could tagging perhaps be an interesting method for the
> >>generation of knowledge?
> >>
> >>
> >>
> >>>I agree with the sentiment that a taxonomy should not be 'kept'
> >>>private and
> >>>that 'a taxonomy should come from ourselves and our interactions with
> >>>others', but current controlled vocabularies, classification schemes
> >>>and
> >>>taxonomies ARE largely derived from the people, albeit it in a more
> >>>elaborate fashion.  Most prominent schemes are regularly revised and
> >>>such
> >>>revisions entail a detailed analysis of current knowledge and
> >>>literature
> >>>whereby appropriate terminology and concepts are harvested and
> >>>inserted.  If
> >>>one is fortunate enough to use a dynamic facility, such as OCLC's
> >>>connexion,
> >>>then one can expect revisions every minute of everyday.  These
> >>>revisions
> >>>will be internationally consistent, will use the best vocabulary to
> >>>serve
> >>>the most people, and will at least support resource discovery for the
> >>>21st
> >>>century.
> >>>
> >>>So, if you have been unable to recognise my position (!), folksonomies
> >>>are
> >>>interesting, but controlled vocabularies (in the strictest sense) will
> >>>remain for many, many years!  I'm certainly confident of that, even if
> >>>Steve
> >>>Richardson isn't! ;-)
> >>>
> >>>
> >>I'm sure they're always going to be around too, the question is how
> >>widely they'll be used compared with less rigorous approaches. But I'd
> >>still like to thank Steve for making such a provocative statement :-)
> >>
> >>- Scott
> >>
> >>
> >>
> >>>George
> >>>----------------------------------------------
> >>>George Macgregor,
> >>>Centre for Digital Library Research (CDLR),
> >>>Department of Computer & Information Sciences,
> >>>University of Strathclyde, Livingstone Tower,
> >>>26 Richmond Street, Glasgow, UK, G1 1XH
> >>>tel: +44 (0)141 548 4753
> >>>web: http://cdlr.strath.ac.uk/
> >>>
> >>>
> >>>
> >
> >
> >
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options