Print

Print


Good morning,

I would like to comment from a completely different perspective, as a large repository which has developed several online catalogues, embracing new technologies.

The National Archives stopped adding subject index terms in 2007. The reasons behind the decision were multiple. We had compelling data around the number of authority subject searches carried out by users and staff versus the number of basic free text and advanced searches.  Resources for the maintainance of subject terms and our thesaurus were hard to justify. On top of things our subjects were mostly applied at series/collection level and this didn't address user and business needs. However, we continued to develop our rich government corporate bodies, creators, personal and place names authority files.

Between 2010 and 2011 we tried a different approach, developing subject categorisation and filtering functionality for the new Discovery catalogue, creating subject tags deep across 12 million document level catalogue descriptions.


We use an automatic categorising tool to tag individual document using our own list of subject categories. This has changed how we do some of our cataloguing work. After researching subjects, sophisticated queries are built on a tool linked to the search engine to identify which descriptions should be tagged under different subjects. The method is semi automated as an archivist builds the queries that drive the categorisation engine. New data that matches the taxonomy queries is also automatically tagged with the relevant subject. We ran a business as usual programme to improve accuracy and coverage.

I don't mean to suggest that this is the way forward for everybody.

Best regards

Jone


Jone Garmendia
Head of Cataloguing
The National Archives
Tel  +44 (0)20 8392 5330 Ext. 2415
www.nationalarchives.gov.uk



From: Archivists, conservators and records managers. [mailto:[log in to unmask]] On Behalf Of Doherty Teresa
Sent: 17 July 2013 17:44
To: [log in to unmask]
Subject: Re: UKAT or something else?

Hi

I would agree that using existing standard thesauri/controlled vocabularies is really useful. Part of the trick is choosing the correct thesaurus for the correct job.

There are controlled vocabularies for concepts (subjects) as well as object names, geographic names, art and architecture, medical subject terms etc. The main reason to use standard terms is that they create pathways into our collections from the outside world. By showing clusters around particular terms we get more people spotting that we exist and we have stuff they are interested in. This is often at quite a high/general level and is often used for browsing or for initial searching. Ask around similar organisations and you'll get pros and cons for different thesuarii. (I've never actually used UKAT, though I've used MeSH and UNESCO in conjunction with Getty controlled vocabularies)

Remember we usually have enough free text space in our catalogue records to go into very specific detail about what we have if necessary.

Once you've chosen the appropriate controlled vocabulary it's worth recognising that they develop over time.

There are periods where the vocabulary is static - often because it's in step with the cultural and user norms of the time. Then something happens and there is enough reason to change. Often the reason something is static (like UKAT) is that it is waiting for people to put their hands up to do a revision/expansion/development. So as Paul says - if you think it needs revising ask around and you'll probably find a working group!

Teresa

-----Original Message-----
From: Archivists, conservators and records managers. [mailto:[log in to unmask]] On Behalf Of Jane Stevenson
Sent: 17 July 2013 17:00
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: UKAT or something else?

Hi there,

I would like to endorse the comments in support of the use of controlled vocabularies. The idea that because Google takes what is seen as a keyword approach then we don't need structured text is simply wrong in my opinion. Search engines like Google like structured text, and it seems to me things are increasingly going in that direction (e.g. schema.org).

Our Linked Data work has benefitted enormously from Archives Hub descriptions having index terms in them, because it gives us structured data to work with. For example, we've been looking at connecting our LCSH terms to the LCSH linked data. I think the potential of data is always something to bear in mind, and the more structured data is the more it can be part of the more Linked Data/Semantic approach that is becoming increasingly influential.

In terms of choice of thesaurus, we allow use of any recognised thesaurus as long as it is specified. We find some contributors do prefer LCSH simply because its so much more specific. You can have something as specific as "Steam-engines--England--History--18th century" as a subject term. However, UKAT or Unesco work well for many archives. The UKAT term "Steam engines" actually comes from the Hub (UKAT included all Hub index terms, so it has some that are LCSH where they have been used in the Hub). The Unesco thesaurus doesn't seem to have anything similar at all. I guess the choice of thesaurus is down to what you want to achieve, and particularly the level of specificity you want. LCSH can get quite complicated.

cheers,
Jane


Jane Stevenson
The Archives Hub
Mimas, The University of Manchester
Devonshire House, Oxford Road
Manchester M13 9QH

email:[log in to unmask]
tel: 0161 275 6055
website: archiveshub.ac.uk
blog: archiveshub.ac.uk/blog
twitter: twitter.com/archiveshub

On 17 Jul 2013, at 16:47, Paul Sillitoe <[log in to unmask]<mailto:[log in to unmask]>>
wrote:

> This raises the issue of what we mean by subject terms. It is an area
> that still causes confusion, especially when trying to work across
> archive and museum, information and object domains.
>
> For example, a "steam engine" is an object; an appropriate related
> subject term would be "mechanical engineering | steam" if not more
> simply "steam engineering".
>
> Whatever sets of authority terms are used, I would urge anyone to use
> those that are already established, rather than use their own. If the
> established standards need revision, let's do it! We shall not achieve
> effective resource discovery until useful data standards are universally adopted.
>
> Paul Sillitoe
>
> -----Original Message-----
> From: Archivists, conservators and records managers.
> [mailto:[log in to unmask]] On Behalf Of Dave Caroline
> Sent: 17 July 2013 15:02
> To: [log in to unmask]<mailto:[log in to unmask]>
> Subject: Re: UKAT or something else?
>
> On Wed, Jul 17, 2013 at 2:18 PM, Claire Collins
> <[log in to unmask]<mailto:[log in to unmask]>> wrote:
>> Hello all,
>>
>> Here at Gloucestershire Archives we are revising our subject indexing
>> and
> I was wondering whether any of you use UKAT - and if you don't what
> you use instead.
>>
>> Although we contributed terms to UKAT we do still have some
>> difficulty
> with it, and are considering using our original subject terms.
>>
>> The last discussion on the list relating to this that I can find
>> dates
> from 2008 - I thought some more recent data might be useful.
>
> I have not used it, though sounded interesting better look at what it
> is/does
>
> I searched for the term "engine"
>
> after wasting time paging through many obviously incorrect terms
> returned I came across the entry for plural "engines"
> http://www.ukat.org.uk/thesaurus/term.php?i=15798
>
> It seriously misses many forms of engine a few examples petrol engine,
> search engine, steam engine, heat engine, traction engine.
>
> It brings up some rhetorical questions:
> Is a controlled vocabulary right at all specially if that vocabulary
> ignore terms ?
> Are you making your information unsearchable by sticking to a subset
> of reality ?
> Are you then merging result sets together that should not be (return
> too
> many) ?
> in short what does a controlled vocabulary give you?

>
> I also tried a search for "steam engine"
>
> I cannot see why it returns
> Aerospace engineering
> Biological engineering USE Biotechnology Broadcasting engineering USE
> Broadcasting technology in the result set, picking on three subjects
> that do not fit steam engine at all well.
>
> I chose technical terms deliberately as that relates to my content here.
> http://www.collection.archivist.info/searchv13.php?searchstr=engine
>
> I had thought of adding some form of thesaurus to my search but am not
> convinced of the validity yet.
>
>
> Dave Caroline
>
> Contact the list owner for assistance at
> [log in to unmask]<mailto:[log in to unmask]>
>
> For information about joining, leaving and suspending mail (eg during
> a
> holiday) see the list website at
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra
>
> Contact the list owner for assistance at
> [log in to unmask]<mailto:[log in to unmask]>
>
> For information about joining, leaving and suspending mail (eg during
> a holiday) see the list website at
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra

Contact the list owner for assistance at [log in to unmask]<mailto:[log in to unmask]>

For information about joining, leaving and suspending mail (eg during a holiday) see the list website at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra

This email message has been delivered safely by Mimecast.
For more information please visit http://www.mimecast.com.


----------------------------------------------------------------------------------------------------------

http://www.rcn.org.uk<http://www.rcn.org.uk/>

This email is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Royal College of Nursing or any of its affiliates.

If you are not the intended recipient be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. If you have received this email in error please return it to the sender immediately. The contents of this message may be legally privileged.

Royal College of Nursing of the United Kingdom
20 Cavendish Square
London W1G ORN
Tel: +44 (0) 345 456 3996
Fax: +44 (0) 20 7647 3436

This email was received from the INTERNET and scanned by the Government Secure Intranet anti-virus service supplied by Vodafone in partnership with Symantec. (CCTM Certificate Number 2009/09/0052.) In case of problems, please call your organisation's IT Helpdesk.
Communications via the GSi may be automatically logged, monitored and/or recorded for legal purposes.
Contact the list owner for assistance at [log in to unmask]<mailto:[log in to unmask]>

For information about joining, leaving and suspending mail (eg during a holiday) see the list website at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra
Please don't print this e-mail unless you really need to.

-----------------------------------------------------------------------------------

 
National Archives Disclaimer
 
This email and any files transmitted with it are intended solely for the use of the 
individual(s) to whom they are addressed. If you are not the intended recipient and 
have received this email in error, please notify the sender and delete the email. 
Opinions, conclusions and other information in this message and attachments that do 
not relate to the official business of The National Archives are neither given nor 
endorsed by it.


------------------------------------------------------------------------------------

Contact the list owner for assistance at [log in to unmask]

For information about joining, leaving and suspending mail (eg during a holiday) see the list website at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=archives-nra