JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCP4BB Archives


CCP4BB Archives

CCP4BB Archives


CCP4BB@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCP4BB Home

CCP4BB Home

CCP4BB  October 2011

CCP4BB October 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: To archive or not to archive, that's the question!

From:

Gerard Bricogne <[log in to unmask]>

Reply-To:

Gerard Bricogne <[log in to unmask]>

Date:

Mon, 31 Oct 2011 18:09:56 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (259 lines)

Dear Martin,

     First of all I would like to say that I regret having made my "remark
500" and apologise if you read it as a personal one - I just saw it as an
example of a dataset it might have been useful to revisit if data had been
available in any form. I am sure that there are many skeletons in many
cupboards, including my own :-) .

     Otherwise, as the discussion does seem to refocus on the very initial
proposal in gestation within the IUCr's DDDWG, i.e. voluntary involvement of
depositors and of synchrotrons, so that questions of logistics and cost
could be answered in the light of empirical evidence, your "Why" question is
the only one unanswered by this proposal, it seems.

     In this respect I wonder how you view the two examples I gave in my
reply to your previous message, namely the "corner effects" problem and the
re-development of methods for collating data from numerous small, poorly
diffracting crystals as was done in the recent solution of GPCR structures.
There remains the example I cited from the beginning, namely the integration
of images displaying several overlapping lattices. 


     With best wishes,
     
          Gerard.

--
On Mon, Oct 31, 2011 at 05:01:38PM +0100, Martin Kollmar wrote:
> The point is that science is not collecting stamps. Therefore the first 
> question should always be "Why". If you start with "What" the discussion 
> immediately switches to technical issues like how many TB, PB etc. $/EUR, 
> manpower. And all the intense discussion will blow out by one single "Why". 
> Nothing is for free. But if it would help science and mankind, nobody would 
> hesitate to spend millions of $/EUR.
>
> Supporting software development / software developers is a different 
> question. If this were the  first question that someone would have asked 
> the answer would have never been "archiving all datasets worldwide / 
> deposited structures", but how could we, the community, build up a resource 
> with different kind of problems (e.g. space groups, twinning, overlapping 
> lattices, etc.).
>
> I still didn't got an answer for "Why".
>
> Best regards,
> Martin
>
>
>
> Am 31.10.2011 16:18, schrieb Oganesyan, Vaheh:
>> I was hesitant to add my opinion so far because I'm used more to listen 
>> this forum rather than tell others what I think.
>> "Why" and "what" to deposit are absolutely interconnected. Once you decide 
>> why you want to do it, then you will probably know what will be the best 
>> format and /vice versa/.
>> Whether this deposition of raw images will or will not help in future 
>> understanding the biology better I'm not sure.
>> But to store those difficult datasets to help the future software 
>> development sounds really farfetched. This assumes that in the future 
>> crystallographers will never grow crystals that will deliver difficult 
>> datasets. If that is the case and in 10-20-30 years next generation will 
>> be growing much better crystals then they don't need such a software 
>> development.
>> If that is not the case, and once in a while (or more often) they will be 
>> getting something out of ordinary then software developers will take them 
>> and develop whatever they need to develop to consider such cases.
>> Am I missing a point of discussion here?
>> Regards,
>>      Vaheh
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of 
>> Robert Esnouf
>> Sent: Monday, October 31, 2011 10:31 AM
>> To: [log in to unmask]
>> Subject: Re: [ccp4bb] To archive or not to archive, that's the question!
>> Dear All,
>> As someone who recently left crystallography for sequencing, I
>> should modify Tassos's point...
>> "A full data-set is a few terabytes, but post-processing
>> reduces it to sub-Gb size."
>> My experience from HiSeqs is that this "full" here means the
>> base calls - equivalent to the unmerged HKLs - hardly raw
>> data. NGS (short-read) sequencing is an imaging technique and
>> the images are more like >100TB for a 15-day run on a single
>> flow cell. The raw base calls are about 5TB. The compressed,
>> mapped data (BAM file, for a human genome, 30x coverage) is
>> about 120GB. It is only a variant call file (VCF, difference
>> from a stated human reference genome) that is sub-Gb and these
>> files are - unsurprisingly - unsuited to detailed statistical
>> analysis. Also $1k is a not yet an economic cost...
>> The DNA information capacity in a single human body dwarfs the
>> entire world disk capacity, so storing DNA is a no brainer
>> here. Sequencing groups are making very hard-nosed economic
>> decisions about what to store - indeed it is a source of
>> research in itself - but the scale of the problem is very much
>> bigger.
>> My tuppence ha'penny is that depositing "raw" images along
>> with everything else in the PDB is a nice idea but would have
>> little impact on science (human/animal/plant health or
>> understanding of biology).
>> 1) If confined to structures in the PDB, the images would just
>> be the ones giving the final best data - hence the ones least
>> likely to have been problematic. I'd be more interested in
>> SFs/maps for looking at ligand-binding etc...
>> 2) Unless this were done before paper acceptance they would be
>> of little use to referees seeking to review important
>> structural papers. I'd like to see PDB validation reports
>> (which could include automated data processing, perhaps culled
>> from synchrotron sites, SFs and/or maps) made available to
>> referees in advance of publication. This would be enabled by
>> deposition, but could be achieved in other ways.
>> 3) The datasets of interest to methods developers are unlikely
>> to be the ones deposited. They should be in contact with
>> synchrotron archives directly. Processing multiple lattices is
>> a case in point here.
>> 4) Remember the "average consumer" of a PDB file is not a
>> crystallographer. More likely to be a graduate student in a
>> clinical lab. For him/her things like occupancies and B-
>> factors are far more serious concerns... I'm not trivializing
>> the issue, but importance is always relative. Are there
>> "outsiders" on the panel to keep perspective?
>> Robert
>> --
>> Dr. Robert Esnouf,
>> University Research Lecturer, ex-crystallographer
>> and Head of Research Computing,
>> Wellcome Trust Centre for Human Genetics,
>> Roosevelt Drive, Oxford OX3 7BN, UK
>> Emails: [log in to unmask]   Tel: (+44) - 1865 - 287783
>>     and [log in to unmask]        Fax: (+44) - 1865 - 287547
>> ---- Original message ----
>> >Date: Mon, 31 Oct 2011 11:37:47 +0100
>> >From: CCP4 bulletin board <[log in to unmask]> (on behalf
>> of Anastassis Perrakis <[log in to unmask]>)
>> >Subject: Re: [ccp4bb] To archive or not to archive, that's
>> the question!
>> >To: [log in to unmask]
>> >
>> >   Dear all,
>> >   The discussion about keeping primary data, and what
>> >   level of data can be considered 'primary', has -
>> >   rather unsurprisingly - come up also in areas other
>> >   than structural biology.
>> >   An example is next generation sequencing. A
>> >   full-dataset is a few tera bytes, but
>> >   post-processing reduces it to sub-Gb size. However,
>> >   the post-processed data, as in our case,
>> >   have suffered the inadequacy of computational
>> >   "reduction" ... At least out institute has decided
>> >   to create double back-up of the primary data in
>> >   triplicate. For that reason our facility bought
>> >   three -80 freezers, one on site at the basement, on
>> >   at the top floor, and one off-site, and they keep
>> >   the DNA to be sequenced. A sequencing run is already
>> >   sub-1k$ and it will not become
>> >   more expensive. So, if its important, do it again.
>> >   Its cheaper and its better.
>> >   At first sight, that does not apply to MX. Or does
>> >   it?
>> >   So, maybe the question is not "To archive or not to
>> >   archive" but "What to archive".
>> >   (similarly, it never crossed my mind if I should "be
>> >   or not be" - I always wondered "what to be")
>> >   A.
>> >   On Oct 30, 2011, at 11:59, Kay Diederichs wrote:
>> >
>> >     Am 20:59, schrieb Jrh:
>> >     ...
>> >
>> >       So:-  Universities are now establishing their
>> >       own institutional
>> >
>> >       repositories, driven largely by Open Access
>> >       demands of funders. For
>> >
>> >       these to host raw datasets that underpin
>> >       publications is a reasonable
>> >
>> >       role in my view and indeed they already have
>> >       this category in the
>> >
>> >       University of Manchester eScholar system, for
>> >       example.  I am set to
>> >
>> >       explore locally here whether they would
>> >       accommodate all our Lab's raw
>> >
>> >       Xray images datasets per annum that underpin our
>> >       published crystal
>> >
>> >       structures.
>> >
>> >       It would be helpful if readers of this CCP4bb
>> >       could kindly also
>> >
>> >       explore with their own universities if they have
>> >       such an
>> >
>> >       institutional repository and if raw data sets
>> >       could be accommodated.
>> >
>> >       Please do email me off list with this
>> >       information if you prefer but
>> >
>> >       within the CCP4bb is also good.
>> >
>> >     Dear John,
>> >
>> >     I'm pretty sure that there exists no consistent
>> >     policy to provide an "institutional repository"
>> >     for deposition of scientific data at German
>> >     universities or Max-Planck institutes or Helmholtz
>> >     institutions, at least I never heard of something
>> >     like this. More specifically, our University of
>> >     Konstanz certainly does not have the
>> >     infrastructure to provide this.
>> >
>> >     I don't think that Germany is the only country
>> >     which is the exception to any rule of availability
>> >     of "institutional repository" . Rather, I'm almost
>> >     amazed that British and American institutions seem
>> >     to support this.
>> >
>> >     Thus I suggest to not focus exclusively on
>> >     official institutional repositories, but to
>> >     explore alternatives: distributed filestores like
>> >     Google's BigTable, Bittorrent or others might be
>> >     just as suitable - check out
>> > http://en.wikipedia.org/wiki/Distributed_data_store.
>> >     I guess that any crystallographic lab could easily
>> >     sacrifice/donate a TB of storage for the purposes
>> >     of this project in 2011 (and maybe 2 TB in 2012, 3
>> >     in 2013, ...), but clearly the level of work to
>> >     set this up should be kept as low as possible (a
>> >     bittorrent daemon seems simple enough).
>> >
>> >     Just my 2 cents,
>> >
>> >     Kay
>> >
>> >   P please don't print this e-mail unless you really
>> >   need to
>> >   Anastassis (Tassos) Perrakis, Principal Investigator
>> >   / Staff Member
>> >   Department of Biochemistry (B8)
>> >   Netherlands Cancer Institute,
>> >   Dept. B8, 1066 CX Amsterdam, The Netherlands
>> >   Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile /
>> >   SMS: +31 6 28 597791
>> To the extent this electronic communication or any of its attachments 
>> contain information that is not in the public domain, such information is 
>> considered by MedImmune to be confidential and proprietary. This 
>> communication is expected to be read and/or used only by the individual(s) 
>> for whom it is intended. If you have received this electronic 
>> communication in error, please reply to the sender advising of the error 
>> in transmission and delete the original message and any accompanying 
>> documents from your system immediately, without copying, reviewing or 
>> otherwise using them for any purpose. Thank you for your cooperation.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager