Dear colleagues,
I know this will only apply to a subset of my fellow librarians on this
list, but I'm hoping someone out there may be an expert in the use of
stopwords in implementing search systems.
I'm reviewing the stopwords we use for searching our digital repository
application intraLibrary (i.e. words that are not searched even if they
are in a search query, e.g. "and"). In intraLibrary our stopwords are
currently applied only to searches of metadata fields. Soon we will be
using them for natural language text searching as well (like a search
engine). It seems to me that stopwords for the two purposes may be
different. A lot of the stuff I've found online gives *very* long lists
of stopwords for search engines searching websites and entire
documents. However, I wouldn't want to exclude terms like "between"
from a metadata-only title search. Our current standard list of
stopwords that comes with MySQL seems a bit too extensive to me,
including things like "still" and "between". I think maybe it's the
wrong list for our purposes.
Any thoughts, advice, stopword lists or shared experience welcome. This
is a new area to me.
Best
Sarah
--
Sarah Currier
Product Manager, Intrallect Ltd.
http://www.intrallect.com
2nd Floor, Regent House
Blackness Road
Linlithgow
EH49 7HU
United Kingdom
Tel: +44 870 234 3933 Mob: +44 (0)7980855801
E-mail: [log in to unmask]
--
|