David and Mark,
On Mon, 26 Jan 2004, David Berry wrote:
> Thanks. My worry is if the original VOTable people could come up with
> something which is now being deemed to be "not proper XML" - then we may
> well fall into a similar trap.
I'm not really clear why VOTable is `not proper XML'. If I understand
Mark's summary,
> Briefly, Tony Linde is unhappy about blanket use of the VOTable format
> because it's not XMLish enough. As far as I can see, the main
> thing he doesn't like is that if you've got a document like:
>
> <TABLE>
> <FIELD name="Object name" ID="NAME" datatype="char" arraysize="*"/>
> <FIELD name="V Magnitude" ID="VMAG" datatype="double"/>
> <DATA>
> <TABLEDATA>
> <TR><TD>M31</TD><TD>3.4</TD></TR>
> <TR><TD>Fomalhaut</TD><TD>1.23</TD></TR>
> </TABLEDATA>
> </DATA>
> </TABLE>
>
> then you can't write a very good schema for it, since there's nothing
> much you can say about what's in a TD element - some TDs will contain
> numbers, some will contain strings. This means you can't use XML binding
> to generate automatic code for parsing such documents, instead someone
> would have to (horror!) fire up an editor and write some actual
> source code. I get the feeling there are other issues about storing
> such documents in XML databases[...]
...it's only <td> that's the problem.
The only thing that makes this odd XML is that it's not trivial to
write an XPath expression to find individual table elements, since
there isn't an element named `MAG'. However /TABLE/DATA/TABLEDATA/TR/TD[2]
(I think; I could check if you want) gives all the elements in column 2
(the VMAG column) once you know that's the column you're after.
If you believe in XSchema, then you believe that element contents are
typed (column 1 here is a string, column 2 a numeric, for example),
and you will be upset that these don't have specified types. If you
don't believe in XSchema (and you possibly shouldn't, unless you are
a database person who wants to suck all XML into SQL databases), then
you don't care that there isn't a predefined type for each element --
that's the application's (the thing reading the XML) problem.
DTDs declare that an element may have no content, character-data content,
and a couple of more exotic types, and that's that. The main thing
you get from XSchemas is the type system, and that's where _all_ the
complication lies.
VOTable defines tables (ahem!), so you don't know a priori what types
of elements are going in which columns. So what? You've still got to
process the table with an application which is smart enough to parse
the <FIELD> elements and thus work out what to do with the columns.
This means that you might have difficulty processing a VOTable with
standard XML tools; but (very probably) the only thing you lose
there is the warm glow that comes from using standard tools.
The database issue is separate. I don't know much about XML
databases, but the little I know is that they're just databases that
you query with XPath or XQuery expressions, and get XML out. I
imagine that when you're setting up your shiny new XML database
there's a box where you have to type in the name of your XSchema file.
That's maybe the `problem with XML databases'.
If you're storing your XML in a SQL database, then you have to care
about types. But you know the type of column 2 in the example above,
it's the value of /TABLE/FIELD[2][log in to unmask] So you have to do a
little work to do some transformation before ingestion. My heart bleeds.
I'm coming round to the position that VOTable is well-defined,
less-is-more XML.
How's that!
Norman
--
---------------------------------------------------------------------------
Norman Gray http://www.astro.gla.ac.uk/users/norman/
Physics and Astronomy, University of Glasgow, UK [log in to unmask]
|