This would be a good forum to work out what to do so that end-users can
search for words or phrases with diacritics or with them stripped out.
What do you do? Encode dual views of everything?
E.g.
<choice>
<orig>something with diacritics</orig>
<reg>same thing with diacritics stripped out</reg>
</choice>
Maybe use a program to generate a stripped version of a text encoded
with diacritics? (Not too hard to program.) Or encode without diacritics
and use a program to put them in? (I don't know of any such program. It
would be useful. I imagine it would be hard to do.)
One has to remember that scribes and rock-pounders don't necessarily get
the diacritics "right". Also, end-users are likely to get diacritics
wrong.
Best
Tim Finney
PR said:
Anyway, MM's comment isn't just a hunch, it's a solid insight: search
by character+diacritical has only limited value, and many of us tend
to forget the accidence of certain words (well, I do). I merely
wanted to dissent from Ross's suggestion that this was a fault of the
Unicode character model. (I dissent from Ross pretty often - it's a
mark of respect for his considerable talent and contributions to this
subfield, as I see them.)
|