Print

Print


Thanks for your clarification gisle,

i looked at the xml 1.0 spec myself ... and it would appear that the only
way to specify a encoding would be in the xml processing instruction :

<?xml version="1.0" encoding="big5" ?>

The best solution for you're example would be transcoding ... the robot
would collect the data , one of the datum would be the encoding of the
document being analysed ...

you use that encoding data to convert the information into another encoding
... assuming the original document ghas the encoding correctly identified
...

-----Original Message-----
From: Gisle Hannemyr <[log in to unmask]>
To: [log in to unmask] <[log in to unmask]>
Date: Wednesday, 12 May 1999 23:11
Subject: Re: Re-starting discussion


>Andrew Cunningham wrote:
>> Ok .. i understand your point ... i know in html you are only allowed to
>> have one encoding in the document .. does the same hold for xml?
>
>Hmmm... this I must admit that I have not considered.
>Thanks for raising it.
>
>Re-reading the W3C Recommendation 10-February-1998 of the
>Extensible Markup Language (XML) 1.0 says, I find that
>sec. 4.3.3 says: "Each external parsed entity in an XML
>document may use a different encoding for its characters";
>and "it is an error [...] for an encoding declaration to
>occur other than at the beginning of an external entity".
>
>The way I read this is that there may be one, and only
>one, encoding in use for each external parsed entity.
>Which means that doing as I have done so far (i.e.: putting
>encoding declarations into plain parsed entities) is in fact
>_not_ valid XML. Bummer.
>
>I would welcome further clearification from anyone
>who can provide an authoritative answer on this issue.
>
>I can't honestly see why there should be such a constraint.
>But I guess it is not very productive to disagree with W3C
>recommendations when creating Internet applications :-( .
>
>> personally .. if i was mixing scripts like that .. i'd use a
>> single encoding like the default encodings utf-8 or utf-16
>
>Of course.  But my questions here was to address learn whether
>the other way of doing it was legal XML, not whether it was
>possible to do it differently.
>
>> does anyone know of a practical example of a parser or xml agent that can
>> handle multiple encodings within the same document?
>
>Well, currently mine can :-) (but that doesn't make it XML,
>so I'll take that it out again).
>
>> or are you thinking of a
>> case where only the appropraite language information is extracted (ie
>> language negotiation) and the appropriate encoding used?
>>
>> i'm just trying to clarify for myself whether your example was meant for
>> alng based extraction or whether all languages would be displayed ? since
>> each scenario would place evry different constraints on the parser .. or
xml
>> agent
>
>My application is a web robot that cruises the web looking for XML
>metadata to enter into the database that lives in the heart of our
>Internet search engine.  I was just thinking about how a multilingual
>web site may want to present itself to such a robot, and constructed the
>example from that scenario.
>
>--
>- gisle hannemyr  ( [log in to unmask] - http://home.sol.no/home/gisle/ )
>------------------------------------------------------------------------
>  "Use the Source, Luke. Use the Source." -- apologies to Obi-Wan Kenobi
>------------------------------------------------------------------------
>



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%