Print

Print


Dear colleagues,

I know this will only apply to a subset of my fellow librarians on this 
list, but I'm hoping someone out there may be an expert in the use of 
stopwords in implementing search systems.

I'm reviewing the stopwords we use for searching our digital repository 
application intraLibrary (i.e. words that are not searched even if they 
are in a search query, e.g. "and").  In intraLibrary our stopwords are 
currently applied only to searches of metadata fields.  Soon we will be 
using them for natural language text searching as well (like a search 
engine).  It seems to me that stopwords for the two purposes may be 
different.  A lot of the stuff I've found online gives *very* long lists 
of stopwords for search engines searching websites and entire 
documents.  However, I wouldn't want to exclude terms like "between" 
from a metadata-only title search.  Our current standard list of 
stopwords that comes with MySQL seems a bit too extensive to me, 
including things like "still" and "between".  I think maybe it's the 
wrong list for our purposes.

Any thoughts, advice, stopword lists or shared experience welcome.  This 
is a new area to me.

Best
Sarah

-- 
Sarah Currier
Product Manager, Intrallect Ltd.
http://www.intrallect.com

2nd Floor, Regent House
Blackness Road
Linlithgow
EH49 7HU
United Kingdom

Tel: +44 870 234 3933    Mob: +44 (0)7980855801
E-mail: [log in to unmask] 
--