Print

Print


Hi everyone,

I've just returned from a few days walking and am surprised (in a pleasant way) by the amount of discussion that has been going on over the four submissions. Rather than point-by-point responses, I thought it might be useful to build on my original document by fleshing out some of the topics - hopefully this will address some of the comments along the way.

It always seems to me (as someone who has invested most of the last decade slaving in the mines on this subject) that vocabulary management is a bit of a Cinderella topic. My experience is that organisations would like to

o	have ready access to reliable, high quality, topical vocabularies, 
o	have simple mechanisms to apply categorisation to content with these vocabularies, and, most important, 
o	build better information systems that are brim-full of ways to use categorisation in helping with targeted information discovery.

The question is, why are such enhanced information systems not the norm? Is it the case, as one writer suggested, that dozens of easy-to-use and capable technical solutions are at our fingertips but we are defeated by issues of system-wide management and sustainability? 

I don't think so, personally. 

My experience is that organisations get it - "it" being the idea of improving information discovery through tagging, that is. Search engines really don't work. Search for anything using Google or other search engines and you will get the results you are looking for - in amongst a haystack of stuff that you are almost certainly not looking for. Why? Because search engines, even super-successful search engines like Google's, are fundamentally dumb. You asked for SEAL, you get SEAL: choose from any of the 242,000,000 results on Google today:

Seal: the musician
SEAL: Social and Emotional Aspects of Learning
SEAL: Society for Effective Affective Learning
SEAL: the PR company in Shrewsbury
Seal: the adorable blubbery sea-mammal
Seal: the device for authenticating documents
SEAL: the US elite naval warriors

...and so on.

Of course this is an extreme example, but it illustrates the fundamental point. Search engines are dumb. They don't consider, don't care, what you mean by what you say you are looking for. You ask for a word, and the search engine will find documents featuring that word. Aside from a few ancillary frills, every search engine does this.

I don't want to belabour the point, because I think it goes without saying that you, like the organisations I talk to, "get it". So why don't organisations go for controlled vocabularies in their information services?

In my view, the answer lies in the lack of a truly easy-to-use, effortlessly available and holistic (and unashamedly technical) solution.

Let's look at the components of such a solution.

Component 1. A freely-available, easy-to-use repository of tagging vocabularies
To be honest, I don't know how many freely-available repositories of vocabularies there are, and that's part of the problem. There has been no concerted, widely-supported effort to bring shared vocabularies to the attention of the organisations that could benefit from them. That is unfortunate. I have to say that I believe that the government has been very slow in understanding the potential impact and providing the appropriate support for the few repositories (such as the now-defunct Becta Vocabulary Bank) that have been trialled (the work started in that repository continues in the vocman Bank).

My proposal is to put effort into developing the Open Vocabularies Service, because it is simple (simple is always good in my view), free and, with a clear user interface, clarifies the structure and purpose of vocabularies.

Component 2. A simple way to create and manage such vocabularies
There are a variety of vocabulary management products on the market. I've used quite a few of them, and I like some of them. However, I think there is room for an easier to use tool, more attainable for "the rest of us". Key to this, in my view, is visually clear tools, and I firmly believe that the web is where tools like this should reside. The Visual Vocabulary Tools were conceived with this in mind. In any future scenario of use of the VVT, I envisage that the user will simply register in order freely to be able to create and edit publicly-accessible vocabularies.

With the basic technical issue out of the way, the organisation can concentrate on either assessment of the existing vocabularies for relevance to their content, or on the design and creation of the specific vocabularies needed for their purposes.

My proposal is to improve the existing, fairly rudimentary VVT to provide a more robust product.

Component 3. A way to apply vocabularies to content
OK - you've discovered vocabularies, you're sold on the idea that your content should be tagged, and you've found this brilliant open vocabulary service (forgive my chutzpah). What next? How do you actually get your content tagged against these vocabularies?

My experience so far (limited to Drupal, though WordPress is also feasible) suggests that web-based tagging tools could work really well. The appropriate plug-in is installed in an otherwise vanilla implementation of the content management system, and the content tagger installs a Google Chrome or Firefox extension. The latter product detects when the tagging editor is on a content editing page and provides simple visual tools (in the work we have done so far, drop-downs with auto-completion search lists to find appropriate tags) to identify suitable tags. Those tags are then written back to the content (using the CMS built-in taxonomy).

Component 4. A way to discover content based on categorisation
This is the area in which my proposal is currently fairly sketchy. My experience with Drupal uses internal Drupal data structures, with some custom modules, for creating navigation schemes that take advantage of the tagging of content. It also makes use of Solr indexing to provide some faceted searching and browsing.

I think a more generally-applicable solution lies elsewhere. I don't want to go into detail here, because a detailed solution is not part of my proposal. But my proposal would involve investigation in a variety of areas leading to a demonstrator, and I'll mention a couple of these.

4.1. Micro-formatted or RDFa-tagged content. In the CMS products that I have worked with, the tagging of content is deliberately abstract - the content is stored separately from the tags. What if we didn't do it that way? What if the tagging tools allowed us to store content tagged with vocabulary terms using an microformatting scheme, or RDFa? This would be stored in the page content itself, so that appropriately-aware browsers and search tools could find them. Suddenly, our search engines are doing something intelligent! They can group the search results according to the in-page tagging provided by micro-formatting. Google and Yahoo already have some developments in their capabilities in this area.

4.2 A RESTful web service could provide a simple information discovery mechanism allowing remote services to query an information service to retrieve content by tagging ("give me any content you have on the subject of Chemistry and aimed at subject leaders"). I have done some work on this with Drupal and believe that it could be an effective general mechanism.

By whatever mechanism content is tagged, an enhanced Open Vocabulary Service could be aware of the information services that have tagged against its vocabularies (by leveraging the URI mechanism used to identify individual terms) and thus provide an enhanced information discovery service. I believe that this would be a significant step along the way to creating an infrastructure for distributed information discovery: common vocabularies, simple mechanisms for enhancing content description using such vocabularies and mechanisms for finding that enhanced content wherever it is.

It's worth saying in closing, that none of what I have written above is specific to the OER audience here. Absolutely - I think it is high time that wider information services had access to mechanisms to improve information discovery beyond what we put up with from search engines! However, it seems to me that our educational information services are in a key position to be able to champion the development of improved information discovery across distributed information systems.

I think I have probably written enough for now, but I'd be happy to answer any questions.



Ian.
-- 
Ian Piper
Tellura Information Services - the web, document and information people
Registered in England and Wales: 5076715, VAT Number: 874 2060 29
http://www.tellura.co.uk/
Author of "Learn Xcode Tools for Mac OS X and iPhone Development", Apress, December 2009
01926 811574 | 07973 156616
-- 







On 12 Apr 2011, at 16:53, Lorna M Campbell wrote:

> Dear all, 
> 
> Rather belated thanks to everyone who submitted proposals on Friday in response to the JISC CETIS OER Technical Mini-Projects call.    We would very much like to encourage open discussion of these bids so if you have any thoughts or opinions on the proposals please forward them to this list.  
> 
> For those who have submitted proposals, you are welcome to comment on the other submissions but please declare that you have an interest in another bid. 
> 
> A panel of JISC and CETIS representatives will meet on Tuesday 19th April to decide the outcome of the call so you have until the end of the day on Monday the 18th to discuss the proposals.  
> 
> All constructive criticism will be very welcome indeed!  
> 
> Thanks 
> Lorna
> 
> --
> Lorna M. Campbell
> JISC CETIS Assistant Director
> University of Strathclyde
> Glasgow
> Email: [log in to unmask]
> Phone: +44141 548 3072
> Skype: lorna120768
> 
> The University of Strathclyde is a charitable body, registered in Scotland, number SC015263.