Some answers to Rosse's useful list:
#1 isn't really quite accurate. It is true that alpha+acute and alpha
+grave are separate Unicode characters. But it is also the case that
the different alphas are or can be bundled for search procedures.
Thus on the Macintosh character palette, an alpha with any accent
will bring up all the other alphas. And Java lets you search on a
case/diacritic inensitive basis.
#2 is true. What would it take to rewrite morpheus to accept Unicode
or write a preprocessing routine that converts Unicode to betacode
when you want to feed morpheus?
#3 is also the case. But it is theoretically and practically possible
to generate appropriate Unicode sequences.
On Aug 27, 2005, at 10:22, Ross Scaife wrote:
> Let me pass on practical considerations one hears, for possible
> inclusion in the DC FAQ on this topic.
>
>
> A. Arguments one hears for coding polytonic classical Greek with TLG
> Beta Code even today in new e-pubs:
>
> 1. Unicode conflates the idea of "character" and "glyph", treating an
> alpha+acute as a different letter from an alpha+grave, and a terminal
> sigma as different from a medial sigma.
>
> 2. Morpheus (Perseus morphological parser, aka cruncher) needs Beta
> Code input.
>
> 3. There are symbols defined in Beta Code but not yet defined in
> Unicode, and symbols defined in both, but with no font support in
> Unicode (but this is a problem either way).
>
>
> B. Arguments one hears for coding polytonic classical Greek with
> Unicode in new e-pubs:
>
> 1. Unicode _is_ an international standard.
>
> 2. It sucks to have to implement a transcoder vel sim. in an already
> hairy process off setting up tomcat/cocoon or other on-the-fly
> publication framework.
>
> 3. If you offer your XML source files for download, and the Greek is
> TLG B C, people can't read them easily, without conversion.
>
> 4. By virtue of the transcoder and other conversion methods out there,
> we can always go _back_ to Beta Code, on the fly, when it is
> necessary.
>
> 5. Beta code, by using punctuation marks in non-standard ways,
> requires a rewrite of any tokenizer (e.g. you can't count on ")" to
> follow the end of a word); this requires some extra programming in
> some instances.
>
>
> On 8/27/05, Gabriel BODARD <[log in to unmask]> wrote:
>
>> That's a good question. Just three or four years ago, I came to
>> the conclusion
>> for a project I was working on that we should use Beta Code for
>> data input.
>> This was always conceived as an interim measure, and we were able to
>> bulk-convert this to Unicode later without any trouble.
>> Nevertheless, I would
>> not make the same decision again now (with better support for
>> Unicode input in
>> Mac OSX, for example). The issues are worth exploring, though.
>>
>> (Pure TLG Beta Code, it should be noted, does more than Unicode:
>> it is both
>> encoding system and markup scheme. If you have a Beta Code text,
>> you probably
>> need to convert the encoding to Unicode and the markup to XML
>> simultaneously.
>> No?)
>>
>> G
>>
>> Quoting Ross Scaife <[log in to unmask]>:
>>
>>
>>> I'd like to suggest sub-topic question for the FAQ:
>>>
>>> Should I use TLG betacode or Unicode for polytonic classical
>>> Greek in
>>> my electronic publications?
>>>
>>> There are those who continue to maintain that betacode is the right
>>> choice, and on the other hand certain respected recent publications
>>> that have used Unicode.
>
|