I agree, and the metadata provided can be incredibly minimal.  Some repositories require little more than a title and mime type for a file. Others collect a few more fields, such as an abstract and maybe some contact information  (e.g., FigShare). This approach allows the repositories to build a broad collection, but at the expense of being very shallow in terms of standardized use and understanding metadata.  They are popular with researchers because they are quick and easy to use, and its fast to contribute 'open data'.  

It takes at least an hour or two, and likely the better part of a day or more, to develop a good metadata record for a data set that follows community standards (e.g., EML, FGDC, ISO19115, etc).  Dublin Core and DataCite Kernel metadata are light and fast, but provide little ability to understand the data.  Its a tradeoff all repositories have to make, because if it is too burdensome, its harder to convince researchers to engage at all.  But at some point there is a limit as to what one can do without detailed structural and methodological metadata about the data contents.  I think the approach of better linking data, analytical code, and papers and publications in a fine-grained way will enrich our ability to understand and interpret the science that people are archiving.  So kudos to the team on their new award.

Matt



On Fri, May 2, 2014 at 11:44 AM, Daniel C. Tsang <[log in to unmask]> wrote:
Good questions Libbie - seems like with Open Access for research data - managing data is becoming rather minimalist - basically nothing beyond bit-level process (as you write), and adding some metadata.  dan

On 5/2/2014 12:11 PM, Libbie Stephenson wrote:
This sounds very interesting ... could anyone define what is meant in terms of "preservation"?  I ask because there are so many different ways this could be intended, and I see the term used in many different contexts.  The OAIS definition discusses preservation in terms of the data being continuously independently understandable for informed reuse.  If this is the meaning of preservation being used for this project, it will be interesting to see how the project addresses the kinds of activities, the expertise needed and the resources required to achieve this; it's the kind of effort which goes well beyond what many repository systems can handle and involves far more than bit-level processes such as ensuring fixity, authentication and running check sums. 
In any case it's good to see another project on linking data and publications to compare to other similar efforts, such as the Thomson-Reuters Data Citation Index. 



On Fri, May 2, 2014 at 10:12 AM, Katherine McNeill <[log in to unmask]> wrote:
This news should be of interest to colleagues.

-----Original Message-----
From: LibLicense-L Discussion Forum [mailto:[log in to unmask]] On Behalf Of LIBLICENSE
Sent: Thursday, May 01, 2014 7:33 PM
To: [log in to unmask]
Subject: Data Conservancy, IEEE, and Portico news

From: Marita LaMonica <[log in to unmask]>
Date: Thu, 1 May 2014 10:33:53 -0400

The Data Conservancy, IEEE, and Portico receive Alfred P. Sloan Foundation grant to connect publications and their linked data

New York, NY April 30, 2014 The Data Conservancy, IEEE, and Portico announced today their partnership to design and prototype a data curation infrastructure that connects published research and associated data sets for the long-term benefit of researchers worldwide. This two-year project, which is supported by a $602,000 grant from the Alfred P. Sloan Foundation, will result in the development of a service that will build, store, update, and retrieve the connections among publications and data, and preserve those connections over the long-term.

Scholarly digital publications increasingly consist of distinct building blocks, including text, graphics, and data, which often reside in different repositories and are maintained by different institutions, employing different technologies. These components have many, and evolving, relationships that must be preserved over time.
This project will make it possible to preserve not just these publications and their underlying data, but the complex relationships among them, thereby supporting the continual development of scholarly communication and digital publishing. A publisher who wants to know if there are reference links to data for a publication, for example, submits article metadata and identifiers to the service, which returns any relationships it finds, thus making it possible to track and preserve these connections through the scholarly communications cycle.

Sayeed Choudhury, associate dean for research data management and Hodson director of the Digital Research and Curation Center at the Sheridan Libraries of Johns Hopkins University, noted, “We believe that the models developed as a result of this project will enable new forms of scholarly communication, and thus help to set the stage for the future of research and digital publishing. Our partnership represents broad perspectives and multifaceted experience, which we believe will result in more meaningful solutions that can be generalized for the entire community.”

“The research community has an immediate and pressing need to make the most effective use of the relationship between publications and their corresponding data,” commented Kate Wittenberg, managing director of Portico. “As scholars continue to explore the possibilities presented by these relationships, it is incumbent on us, their colleagues, to develop a creative vision and infrastructure to support their work.”

The Data Conservancy, a data curation organization; IEEE, the world’s largest technical professional organization and publisher of nearly a third of the world’s technical literature in electrical engineering, computer science, and electronics; and Portico, a digital preservation service, bring together years of experience in digital scholarship, publishing, and preservation. The research work will build on the existing infrastructure, expertise, and relationships they have developed over time. In the initial phase, project leaders will gather requirements from members of the publishing and scholarly communities engaged in research across the physical sciences, social sciences, and humanities.

“Our aim is not only to preserve publications and data—either separately or together—but to preserve the relationships among them,”
commented Gerry Grenier, senior director of publishing technologies at IEEE. “This project represents a big step forward in greater discovery, access, and preservation.”

*******



--
Libbie Stephenson
Distinguished Librarian
Director, UCLA Social Science Data Archive
University of California, Los Angeles
Box 951484
Los Angeles, CA. 90095-1484
310-825-0716
Skype:  libbie.stephenson
[log in to unmask]
Web: http://www.sscnet.ucla.edu/issr/da
Mobile: http://dataarchives.ss.ucla.edu/mobile/

"Be single-minded. Love what you are doing and make it the centerpiece of your life." Donald Cram

-- 
Daniel C. Tsang, Distinguished Librarian
Data Librarian and Bibliographer for Asian American Studies, 
 Economics, Political Science, Orange County documents (interim), 
 & French & Italian (interim)
468 Langson Library, University of California, Irvine
PO Box 19557, Irvine CA 92623-9557, USA
1 949 824 4978 (Tel); 1 949 824 0605 (Fax), [log in to unmask] (E-mail)
Office hours: 4-4:30 p.m. Fridays when on campus, or by appointment
My Subject Guides: http://libguides.lib.uci.edu/profile.php?uid=2616