JISCMail - STARDEV Archives

David and others, hello.

[this is a long un', but it's worth it -- promise]

On 2006 Aug 7 , at 10.29, David Berry wrote:

> On Wed, 2 Aug 2006, Norman Gray wrote:
>
>> Mark,
>>
>> On 2006 Aug 2 , at 12.32, Norman Gray wrote:
>>
>>>>>> think about whether you
>>>>>> need to perform unit conversions for the quantity that you've
>>>>>> identified to mean what you think it means...
>
> Sounds to me like some standard library for handling all this system
> conversion, units conversion, searching, etc,  stuff is needed :-)

Ah, but _that_ we've already got.  What we don't have is something  
_generic_ for what I shall suddenly decide to call semantic conversion.

Until now!  Herewith the demo premiere (I plan to talk about this at  
the Strasbourg VOTech meeting, and I hope at the IVOA, but I'll run  
it past youse first).

>> I meant to add that unit conversions wouldn't be addressed by any
>> sort of solution I'm talking about, but they're rather separate
>> anyway, since unit specifications address how the value is
>> represented -- and thus are to some extent syntactic -- rather than
>> what it is.  No?
>
> In that sense a velocity (say) is a velocity is a velocity, and *all*
> metadata describing it is syntactic, not just the units.
>
> To say "velocity A and B are the same, but just measured in different
> units" seems to me to be no different to saying "velocity A and B are
> the same but just measured in different rest frames". In both cases, A
> and B are representations of the same physical phenomenon. So I can't
> immediately see any reason for treating units differently to any other
> item of metadata. They are all needed if you want to be able to  
> compare
> two values.

You're really pining for the good old Quantity discussion, aren't you?

I think that fundamentally, in the abstract, you're right, and that  
units are as much a part of the meaning of a velocity (say) as  
anything else.  However I think they are practically distinct, and I  
have just now come across what I believe to be a good illustration of  
why.

But first the demo (I'm on the edge of my seat -- I don't know about  
you).



I'm working on the utype-to-utype-to-ucd mappings I was talking about  
a week or so ago, and I'm using the USNO-B catalogue at ROE as a test  
case, simply because it was handy.  That resource has an IVO-ID of  
<ivo://roe.ac.uk/DSA_USNOB/TDB>, and has a set of column descriptions  
which includes

<column>
<name>ra</name>
<description>J2000 Celestial Right Ascension</description>
<datatype>datatype='float'</datatype>
<ucd>POS_EQ_RA_MAIN</ucd>
<unit>deg</unit>
</column>

(this is a type defined by <http://www.ivoa.net/xml/VODataService/ 
v0.5>, and yes, that <datatype> does look a bit odd...).

So, there's implicitly a UTYPE <ivo://roe.ac.uk/DSA_USNOB/TDB#ra>,  
which is a subclass of <http://cdsweb.u-strasbg.fr/UCD/ 
old#POS_EQ_RA_MAIN>.  That is, I can convert the VODataService  
information to RDF.

 From the UCD1 to UCD1+ mappings, I can get that POS_EQ_RA_MAIN is a  
subclass of (well, was mapped to) pos.eq.ra;meta.main.  I can  
generate RDF from that, too.

We might also decide that there is a set of types which is of  
interest to us, or a community we're part of, and that:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix x: <http://example.edu/utypes#>.
x:ra a rdfs:Class.
<ivo://roe.ac.uk/DSA_USNOB/TDB#ra> rdfs:subClassOf x:ra.

(that's RDF, in the form of `Notation3', and says that <...#ra> is a  
subclass of the concept <http://example.edu/utypes#ra>, so that the  
USNO-B RA is a more specific type of RA than the one we've defined  
and documented at that URL).

So we load those different bits of information into the reasoner, and  
then query it:

% cat query.rq
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?t
where {
     <ivo://roe.ac.uk/DSA_USNOB/TDB#ra> rdfs:subClassOf ?t
}
%

(that's SPARQL, and is a broadly SQL-like query language for RDF).   
So we POST the query to the reasoning service:

% curl --data-binary @query.rq \
     --header 'Accept: text/csv' \
     --header 'Content-Type: application/sparql-query' \
     http://localhost:8080/quaestor/kb/ucd
t
http://example.edu/utypes#ra
http://cdsweb.u-strasbg.fr/UCD/old#POS_EQ_RA_MAIN
ivo://roe.ac.uk/DSA_USNOB/TDB#ra
http://www.w3.org/2000/01/rdf-schema#Resource
http://cdsweb.u-strasbg.fr/UCD/words#pos.eq.ra;meta.main
%

(obviously, you could dereference that URL from any code, and if  
you're prepared to URL-encode the query, you can GET it as well).   
So, that gives you a list of all the things that the USNO-B 'ra'  
column is a subclass of.  Our software has presumably been written so  
that it already knows what a <http://example.edu/utypes#ra> is  
(that's why we added the extra mapping information); but if not,  
it'll know what the pos.eq.ra;meta.main UCD is.

Thus, we've gathered together information from a variety of loosely  
cooperating sources:

* the ROE folk declared that the USNO 'ra' column
   was a particular old-style UCD, but they haven't updated it;
* there's a fixed mapping of old-style to new-style UCDs;
* you added the mapping to <http://example.edu/utype#ra> yourself,
   for your own purposes.  Perhaps you had to work it
   out from hard-to-find documentation, or perhaps
   the example.edu namespace is a discipline-specific
   standard, or an IVOA one.

Then we queried it with a very simple expression, getting output from  
which it's easy to extract the information we want.  It means all the  
various actors here can remain fairly loosely coupled, and the  
software reading this can operate at whatever level of generality it  
needs to.



The link to units (getting back to that, David) is that when  
assembling and using this information, I really couldn't see a place  
for the units information which is in the VODataService element  
above.  The statement "USNO-B's 'ra' column is a type of pos.eq.ra"  
is true independently of units.  Once I've established just what this  
USNO-B column is supposed to be (aha, an RA!), then I'm going to have  
to discover what units the data there has, in order to actually read  
it.  So yes, a complete description of the numbers in that column  
requires unit information, but that description can be usefully  
decomposed/factored into orthogonal components, namely the semantic  
information (which I'm taking to mean "column 'ra' is a pos.eq.ra")  
and the unit information.  That's not a principled factorisation, but  
a practical one.

So, what does all that sound like?

See you,

Norman


-- 
------------------------------------------------------------------------ 
----
Norman Gray  /  http://nxg.me.uk
eurovotech.org  /  University of Leicester, UK