JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DC-SCIENCE Archives


DC-SCIENCE Archives

DC-SCIENCE Archives


DC-SCIENCE@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DC-SCIENCE Home

DC-SCIENCE Home

DC-SCIENCE  February 2009

DC-SCIENCE February 2009

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: [Fwd: Metadata for datasets]

From:

Joe Hourcle <[log in to unmask]>

Reply-To:

DCMI Science and Metadata Community <[log in to unmask]>

Date:

Tue, 24 Feb 2009 17:24:49 -0500

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (151 lines)

On Tue, 24 Feb 2009, Stu Weibel wrote:

> Hi, Joe,
>
>> I'm probably too close to this subject, as I've been working with the
>> group developing SPASE, a metadata effort for describing the holdings of
>> space physics archives, and there are some issues with what metadata is
>> useful in different sub-communities, as well as some issues with
>> terminology and how the different communities group their data into
>> collections.


> Are your metadata standards available to the public?

Yes.  The current SPASE draft is at:

 	http://www.spase-group.org/data/doc/spase-1_3_4-draft.pdf

We hope to have it stable for version 1.4 in the next couple of months.



For solar physics, we use a much simpler model to describe the data:

 	http://vso1.nascom.nasa.gov/docs/wiki/DataModel18

(The thing is -- it's what the scientists wrote, and there's been a few 
proposed version 2.0s, but none of those stuck, and we've gone and 
implemented a few extra fields in our search API, and a few more that are 
returned ... see http://sdac.virtualsolar.org/API/VSO_API.html ... and 
there are more that we've been asked to add))


Most of the data is in "self-documenting" formats, such as FITS, HDF, CDF 
or NetCDF, but you often have to check the archive's documentation for use 
caveats to know what assumptions were made in processing the data.




> One of the things I'd like to see emerge from these discussions would be a
> lowest common denominator for data set metadata intended for discovery, and
> then specializations for particular domains.

Unfortunately, you use the term "data set".

The problem that we run into with Solar Physics is that you have to define 
a "data set" to then be able to describe it.  Most other space physics 
disciplines organize their data catalogs around the concept of "data 
series" (sometimes "data products") which have been processed in the same 
way, and then individual "data granules" that are individually resolvable.

We instead have catalogs of each of the individual images taken, and for 
each record track what its operating mode is and processing are.. 
Sometimes, there might be two similar modes that are cross-calibrated and 
merged to get higher cadence.  If you look closely at:

 	http://stereo-ssc.nascom.nasa.gov/cgi-bin/images/?Display=Slideshow;Resolution=512;Start=20070101;Finish=20070102;Detectors=ahead_cor2;Session=1;

you'll see a slight flicker to the top right of the occulting disk.  It's 
actually a processing artifact in every other image, because the 
instrument is flipping modes every other image.  The problem is much more 
obvious when there isn't a feature obscuring it:

 	http://stereo-ssc.nascom.nasa.gov/cgi-bin/images/?Display=Slideshow;Resolution=512;Start=20090101;Finish=20090102;Detectors=ahead_cor2;Session=1;

Sorry, again, I'm too close to the problem ... I've been venting for a 
while because many of the standards assume that there's a one-to-many 
relationship between "data sets" and "data granules", and for our images, 
it's a many-to-many relationship, as the "data set" is just how you've 
chosen to sample the larger collection.



> The structural metadata is a separate issue.  I wonder if there is likely to
> be a coherent TYPE vocabulary for such data, and whether it can be
> enumerated.  Are there tens/hundreds/thousands of different structure types
> for such data?  If there are tens or hundreds we have a chance of TYPEing
> them and providing managed schemas for them.


You might need to qualify 'structural'... I mean, there's the 'how it's 
written to the file' type structure, there's the 'what each field means' 
(more semantic) structure, and then we get into the issue of relationships 
between objects.  The first one we can get around by using one of the 
"self-documenting" formats.  The last one, well, that's a big can of worms 
... see some of my past presentations, where I discuss the problem, and 
propose a reference model for scientific catalogs based on FRBR:

 	http://vso1.nascom.nasa.gov/vso/misc/_README.txt
 	http://vso1.nascom.nasa.gov/vso/misc/AGU2008_DataRelationships.ppt


... and then we get to the 'semantic' issue.  For the data that I manage, 
you can typically describe the dimensions of the data.  For the Virtual 
Solar Observatory, the scientists defined eight layouts, but we're only 
currently only have data representing five of them:

 	http://sdac.virtualsolar.org/cgi/show_details.pl?keyword=DATA_LAYOUT

We can then combine that with the 'physical observable', and can come up 
with a good idea about what the data is:

 	http://sdac.virtualsolar.org/cgi/show_details.pl?keyword=PHYSOBS


The problem really comes with time series data, where it's just time 
plotted against _anything_.  There's a few controlled vocabularies, such 
as IVOA's UCD (Unified Content Descriptors):

 	http://cdsweb.u-strasbg.fr/UCD/ucd1p-words.txt  (the list)
 	http://www.ivoa.net/Documents/latest/UCD.html   (documentation)


In SPASE, the list (in my opinion) is much more complicated, as it's 
broken down into categories, and you have to know where to look all over 
the document:

 	Measured Parameters
 		Photons
 		Fields
 		Particles
 		Mixed
 	Support Parameters
 		Positional
 		Temporal
 		Other

There are other efforts, such as SESDI that are trying to model the 
different parameters using ontologies so they can do reasoning -- you're 
looking for (A), which we don't have, but you can compute it from (B) and 
(C).

 	http://sesdi.hao.ucar.edu/intro.php


... anyway, just looking at UCD+, they have almost 500 descriptors, but 
they're built up from a much smaller list.  SPASE uses a similar concept 
with 'Qualifiers' to try to keep the lists to a more manageable size.  I 
haven't looked at what they're doing with geographic and oceanography, but 
I know they've got a few efforts.  (MMI for marine data, but I'm drawing a 
blank on what the geo group is called):

 	http://marinemetadata.org/


(okay, and now to go and read up on the standards that other people have 
mentioned since I started writing this)

-Joe

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2021
April 2021
February 2021
November 2020
September 2020
August 2020
July 2020
June 2020
March 2020
February 2020
September 2019
August 2019
July 2019
June 2019
April 2019
February 2019
December 2018
September 2018
July 2018
June 2018
April 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
June 2017
May 2017
April 2017
February 2017
January 2017
August 2016
July 2016
June 2016
April 2016
February 2016
January 2016
December 2015
October 2015
September 2015
August 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
September 2014
June 2014
May 2014
April 2014
March 2014
January 2014
December 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
December 2012
November 2012
October 2012
September 2012
August 2012
June 2012
March 2012
December 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager