JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for STARDEV Archives


STARDEV Archives

STARDEV Archives


STARDEV@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

STARDEV Home

STARDEV Home

STARDEV  August 2006

STARDEV August 2006

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

UTYPEs (was: TOPCAT .dmg file)

From:

Norman Gray <[log in to unmask]>

Reply-To:

Starlink development <[log in to unmask]>

Date:

Wed, 2 Aug 2006 12:32:17 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (112 lines)

Mark,

On 2006 Aug 1 , at 16.50, Mark Taylor wrote:

>> If you add UTYPEs to your published data, then you have at least
>> documented what you intend that data to mean, via a dereferencable
>> URL, whether or not any of the stuff I talk about below ever actually
>> happens.
>
> I think that's true, and makes utypes worth using, but we're still
> talking about semantic value which is only accessible to humans.

At this stage, yes.  Making that semantic information practically  
useful to machines is of course the trick.

>>>   You've got to decide whether you're looking
>>> at UCD1 or UCD1+, attempt to make sense of what a load of words
>>> separated by semicolons mean, decide whether, say, phot.mag.reddFree
>>> is an acceptable stand-in for phot.mag, think about whether you
>>> need to perform unit conversions for the quantity that you've
>>> identified to mean what you think it means...
>
> and I forgot to add: what do you do if there are multiple columns  
> which
> have the UCD you're looking for?

If UCDs are all you have, then you will be stuck a lot of the time,  
because UCDs are, by design, not necessarily specific enough to drive  
processing, but from this point of view can only be a fall-back.   
What should be possible with the combination of UCDs and UTYPE is to  
start with the possibly multiple UTYPEs annotating a column, and:

1. Do you recognise any of these, as strings?  If so, you're done.

2. Get a list of things (UTYPEs and UCDs) that are equivalent to, or  
more general than, that list of UTYPEs, in increasing order of  
generality: do you recognise any of these, as strings (I think  
parsing UCDs is probably unlikely to help much)?  If so, you're done.

3. Oh well.  Start grubbing around in the column names, and applying  
all the heuristics you currently apply.

Step 2 is supposed to make things easier.  If folk do start  
annotating with UTYPEs, and an adequate network of relations can be  
built up, then that step will disambiguate columns with identical  
UCDs; it'll tell you, without you having to parse anything, that  
phot.mag.reddFree has a more specific meaning than phot.mag; and  
given that there are relations between UCD1 and the mutating  
vocabulary list of UCD1+, you don't have to worry about the  
difference there either.  So it does at least address the UCD  
problems you noted. [`adequate', here, means `enough to make this  
work', and I don't have reliable intuition about how much that  
actually is.]

Now, that isn't supposed to be magic, and there's a fair amount of  
labour involved there in declaring the relations, but a scenario like  
that is I think realistic.

And if it doesn't work, you're not any worse off than you were before.

> My feeling is that most of
> the questions to which UCDs/utypes appear to provide an answer
> are ones which actually require a human in the loop.  For example,
> there may well be no correct answer to "are any of these utypes
> like phot.mag?", even given a well-defined state of a particular
> data processing system, because it depends on the kind of analysis
> that the scientist using the software has got in mind at the time.

Your second remark is very true.  Context sometimes matters, and  
while it should be possible to work that in to the reasoning I'm  
talking about, it'll be at least harder.  Your first remark can only  
really be answered by trying it, though I'd be more optimistic about  
it than you would be I think.

Also, following Malcolm's remark:

> Is full automation critical?  I can envisage where you want to know
> whether a registered data source has relevant fields, but at some  
> point
> you will need to consult detailed information on the semantics of
> columns to discover if a particlar catalogue is pertinent to the
> specific research project.  Is it a big deal that some natural
> intelligence is brought to bear?  The astronomer can judge whether the
> match is adequate.  The UCDs can help identify potential matches. [It
> would be handy to be able to see the matches in order of likelihood
> (expert system even) and then to be able to click on each to read
> descriptions of the column, and then choose the appropriate matches.]

Yes: there will probably be plenty of cases where you only need to  
know with some degree of approximation what a column is.  There will  
be other cases where you need to know exactly, and the application  
will be written or configured so that if it doesn't recognise the  
UTYPE, then it shouldn't try reasoning about it, or it might let the  
user veto the deduced match.

But as I say, I should go ahead and try it.  Can anyone point me  
towards some list of documented column names?

And I should write shorter messages.  I'm only thinking aloud, I  
suppose.

See you,

Norman


-- 
------------------------------------------------------------------------ 
----
Norman Gray  /  http://nxg.me.uk
eurovotech.org  /  University of Leicester, UK

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

December 2023
January 2023
December 2022
July 2022
June 2022
April 2022
March 2022
December 2021
October 2021
July 2021
April 2021
January 2021
October 2020
September 2020
August 2020
May 2020
November 2019
October 2019
July 2019
June 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
May 2018
April 2018
March 2018
February 2018
December 2017
October 2017
August 2017
July 2017
May 2017
April 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
October 2015
September 2015
August 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
2004
April 2003
2003


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager