Thanks Rutger!

To expand a little on the problem we’re facing regarding naming conventions - we have some very prolific pieces of equipment in the Institute that can generate tons of data very rapidly, and our researchers are finding themselves dealing with a huge number of experimental datasets that are almost identical. In order to differentiate each dataset, the ‘descriptive’ component of the naming convention might have to be 30/40/50+ characters long, and that’s proving to be unwieldy and impractical. 

I didn’t specifically talk to the PI in question about README files (they’re described in our ‘Bite-Sized Data Management’ resource, distributed to all Gurdon Researchers), but, in a nutshell, the basis of this idea is to try to use the ELN as a ‘complete' README file...

Al

On 13 Aug 2018, at 11:31, Jong, R.M. de <[log in to unmask]> wrote:

Hi Alastair,
 
I think an ELN would be a possibility. But a readme with additional details or more metadata in the storage system might also help. If you use a consequent naming structure, which is described well, ‘cryptic’ naming also might be less of a problem. See for examplehttp://digitalscholarshipleiden.nl/articles/best-practices-file-names-and-folder-structures
 
Btw, if you want to have a searchable system, you can’t do without a similar system as the cryptic naming, an ontology to ensure the data is described in a standard way.
 
Another thing I thought of (and made a small proof of principle for), was asking for additional metadata when a new file is generated in an observed directory. This metadata can then be stored next to the file and is easily indexed by the program that let’s you add the metadata. This system can be quite sophisticated if you want to and can help prepare data for the next step: a repository.
 
Best regards,

Rutger
 
 
 
From: Research Data Management discussion list [mailto:[log in to unmask]] On Behalf Of Alastair Downie
Sent: maandag 13 augustus 2018 12:17
To: [log in to unmask]
Subject: Content Management System for research data?
 
Hi all.
 
I had a conversation with one of our PIs recently - she’s becoming anxious about not being able to find data after researchers have left the lab. I talked her through the usual data management advice, about using a good naming convention and a well-organised directory structure etc, but she showed me her lab’s filestore folder and it’s clear that she’s actually doing all of the above pretty well. The problem is the sheer volume of data and the rate at which it can be generated, that requires an almost cryptic naming convention to differentiate datasets. It’s starting to look like a traditional files & folders operating system will not cope well with *very* large research directory trees.
 
In most modern website content management systems, users deposit data (images etc) without having to see what disk they go onto, and without worrying about a directory structure. I’m wondering if the same might be possible for an entire institution’s research data, using an ELN (or other documentation system) as the management interface - so *all* discovery and access would be provided by verbose, plain English links in the description of the research, and users would not be permitted to see/operate a directory structure at all. I understand this is the basis of most data repositories - my idea is to extend this method into the labs, for management of ALL data, rather than just the tip of the iceberg which is the published work. 
 
This would be a huge cultural change of course, and it’d be a challenge to convince researchers that they no longer need a file browser. A leap too far for many, I expect. It’s also not completely clear that this approach would be any more scalable than a traditional file browser approach. So I’d be very grateful to hear from anyone who might have considered this idea or experimented with it already - if the idea has even tiny wee buds where legs might one day grow, it’d be good to discuss further..
 
Thanks,
 
Al
 
 

=====================================================
Alastair Downie (Head of IT)
The Gurdon Institute, University of Cambridge,
Tennis Court Road, Cambridge CB2 1QN, United Kingdom
Office: +44(0)1223 762556
Mobile: +44(0)7989 393304
=====================================================
 
 

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1



To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1



=====================================================
Alastair Downie (Head of IT)
The Gurdon Institute, University of Cambridge,
Tennis Court Road, Cambridge CB2 1QN, United Kingdom
Office: +44(0)1223 762556
Mobile: +44(0)7989 393304
=====================================================



To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1