JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DC-INTERNATIONAL Archives


DC-INTERNATIONAL Archives

DC-INTERNATIONAL Archives


DC-INTERNATIONAL@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DC-INTERNATIONAL Home

DC-INTERNATIONAL Home

DC-INTERNATIONAL  May 1999

DC-INTERNATIONAL May 1999

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Re-starting discussion

From:

"Andrew Cunningham" <[log in to unmask]>

Reply-To:

Andrew Cunningham

Date:

Thu, 13 May 1999 01:47:55 +1000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (89 lines)

Thanks for your clarification gisle,

i looked at the xml 1.0 spec myself ... and it would appear that the only
way to specify a encoding would be in the xml processing instruction :

<?xml version="1.0" encoding="big5" ?>

The best solution for you're example would be transcoding ... the robot
would collect the data , one of the datum would be the encoding of the
document being analysed ...

you use that encoding data to convert the information into another encoding
... assuming the original document ghas the encoding correctly identified
...

-----Original Message-----
From: Gisle Hannemyr <[log in to unmask]>
To: [log in to unmask] <[log in to unmask]>
Date: Wednesday, 12 May 1999 23:11
Subject: Re: Re-starting discussion


>Andrew Cunningham wrote:
>> Ok .. i understand your point ... i know in html you are only allowed to
>> have one encoding in the document .. does the same hold for xml?
>
>Hmmm... this I must admit that I have not considered.
>Thanks for raising it.
>
>Re-reading the W3C Recommendation 10-February-1998 of the
>Extensible Markup Language (XML) 1.0 says, I find that
>sec. 4.3.3 says: "Each external parsed entity in an XML
>document may use a different encoding for its characters";
>and "it is an error [...] for an encoding declaration to
>occur other than at the beginning of an external entity".
>
>The way I read this is that there may be one, and only
>one, encoding in use for each external parsed entity.
>Which means that doing as I have done so far (i.e.: putting
>encoding declarations into plain parsed entities) is in fact
>_not_ valid XML. Bummer.
>
>I would welcome further clearification from anyone
>who can provide an authoritative answer on this issue.
>
>I can't honestly see why there should be such a constraint.
>But I guess it is not very productive to disagree with W3C
>recommendations when creating Internet applications :-( .
>
>> personally .. if i was mixing scripts like that .. i'd use a
>> single encoding like the default encodings utf-8 or utf-16
>
>Of course.  But my questions here was to address learn whether
>the other way of doing it was legal XML, not whether it was
>possible to do it differently.
>
>> does anyone know of a practical example of a parser or xml agent that can
>> handle multiple encodings within the same document?
>
>Well, currently mine can :-) (but that doesn't make it XML,
>so I'll take that it out again).
>
>> or are you thinking of a
>> case where only the appropraite language information is extracted (ie
>> language negotiation) and the appropriate encoding used?
>>
>> i'm just trying to clarify for myself whether your example was meant for
>> alng based extraction or whether all languages would be displayed ? since
>> each scenario would place evry different constraints on the parser .. or
xml
>> agent
>
>My application is a web robot that cruises the web looking for XML
>metadata to enter into the database that lives in the heart of our
>Internet search engine.  I was just thinking about how a multilingual
>web site may want to present itself to such a robot, and constructed the
>example from that scenario.
>
>--
>- gisle hannemyr  ( [log in to unmask] - http://home.sol.no/home/gisle/ )
>------------------------------------------------------------------------
>  "Use the Source, Luke. Use the Source." -- apologies to Obi-Wan Kenobi
>------------------------------------------------------------------------
>



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

August 2021
April 2012
March 2012
February 2012
November 2011
September 2011
May 2011
December 2010
October 2010
September 2010
February 2010
January 2010
October 2009
September 2009
August 2009
July 2009
February 2009
August 2008
October 2007
August 2007
July 2007
May 2007
February 2007
October 2006
August 2006
June 2006
April 2006
September 2005
August 2005
July 2005
June 2005
May 2005
January 2005
December 2004
November 2004
October 2004
September 2004
July 2004
June 2004
November 2003
October 2003
September 2003
June 2003
May 2003
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
October 2001
January 2001
December 2000
November 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
January 2000
November 1999
October 1999
September 1999
July 1999
June 1999
May 1999
February 1999
January 1999
November 1998
October 1998
September 1998
July 1998
June 1998
January 1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager