Hi James,
My knowledge is somewhat simple in this field, but would it not be easier
to search and replace all object names that use special characters and
replacing them with Anglicised terms? Perhaps the 'correct' term could be
stored under a 'related' terms field? This is similar to using equivalent
names for objects in different languages e.g. the Gaelic name for an object.
Am I correct in assuming from a usability point of view, it's highly
unlikely that people would search using special characters? Knowing how to
input special character when typing is a challenge in itself!
Robin
--
Robin Patel
Ergadia Museums & Heritage
t: 01786 860 691
m: 07815 312 562
[log in to unmask]
https://ergadiaheritage.com/
On 26 July 2016 at 10:07, James Morley <[log in to unmask]> wrote:
> Hi all
>
> We were pondering an issue last night with accented and special characters
> in collections search, and wondered if anyone had examples of best practise?
>
> Currently at IWM we treat them uniquely, so a search for cafe gives you
> 361 results, and a search for café 200 results. There's only an overlap of
> about ten results which have both variants, so about 550 combined. Even
> more pronounced is aéroplanes (1 result) and aeroplanes (4900 results).
>
> We're thinking of indexing against both accented and non-accented forms,
> to ensure something with café also gets indexed for cafe - in other words
> merging the results. My one concern then is that the user loses granularity
> and there could be specific examples where quite a precise term gets lost
> in something more generic (though I can't think of a specific example right
> now). From a technology point of view it's all based on Solr, so a thought
> was to somehow push up relevancy ranking for the accented/special character
> matches.
>
> It's interesting to look at search stats and see that people are quite
> extensively using accents and special characters, especially for people and
> place names (and a few for aeroplanes, who must have been quite
> disappointed!). Also, because of the different collections areas and
> historic cataloguing, we seem to have a mix of accurate and 'Anglicised'
> names in our collections data!
>
> Cheers
>
> James
>
>
> James Morley
> Data Developer
>
> Imperial War Museums
> Lambeth Road
> London SE1 6HZ
>
> [log in to unmask]
> 07713 360563
> iwm.org.uk
> @jamesinealing
>
>
> [cid:image002.jpg@01D1E725.894F3210]
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------
> This email message has been delivered safely and archived online by
> Mimecast.
> For more information please visit http://www.mimecast.com
>
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> ****************************************************************
> website: http://museumscomputergroup.org.uk/
> Twitter: http://www.twitter.com/ukmcg
> Facebook: http://www.facebook.com/museumscomputergroup
> [un]subscribe: http://museumscomputergroup.org.uk/email-list/
> ****************************************************************
>
****************************************************************
website: http://museumscomputergroup.org.uk/
Twitter: http://www.twitter.com/ukmcg
Facebook: http://www.facebook.com/museumscomputergroup
[un]subscribe: http://museumscomputergroup.org.uk/email-list/
****************************************************************
|