Dear all,
I apologise for returning to this thread after it may have gone
cold: I was on a prolonged visit abroad and took the Subject line
literally when deciding what e-mails to open, thus not realising that
the topic had drifted from a technical point to a much more general
and substantive one, of which I have only just become aware.
Among the cases that Peter was referring to were truly landmark
datasets from the golden early days of the MAD method, collected on
image plates with no effort spared to interleave wavelengths nor use
inverse-beam protocols to maximise the S/N ratio of dispersive and
anomalous differences, in spite of the considerable physical labour
this involved at the time. I was curious to see whether improved
processing software and new treatments of datasets collected in this
way could lead to improved phasing power. After much searching, old
tapes from MicroVAX days were located but turned out to be unuseable.
This is the kind of old datasets that we (collectively) could
have learned a lot from reanalysing.
I keep feeling surprised at the degree of reticence that is
repeatedly expressed towards the idea that reprocessing raw images can
yield significantly better data. We have all witnessed the constant
improvements of refinement results as old PDB models get re-refined
against the deposited data, using constantly improving refinement
programs. Why is there such a negative prejudice towards the prospect
of similar improvements in the quality of the final data when raw
images are reprocessed with improved integration and scaling programs?
Somehow we are too easily convinced that the latter already do an
essentially perfect job, but anyone close enough to the coalface in
this field will know how very far this still is from being the case.
Unfortunately the doubt whether something is worth the effort often
leads to the decision not to make that effort at all, and thus results
in no evidence being gathered, so that we are left with that initial
prejudice remaining unchallenged - rather like having a prior without
a likelihood :-) .
The recent developments in the processing of XFEL data have shown
what gains can be made by going back to the fundamentals of reflection
profiles, estimations of partiality etc. . A recent conference on
"Challenges in Crystallography" had a session dedicated to these very
developments under the title "The Need to Reinvent the Wheel" - see
https://www.janelia.org/you-janelia/conferences/challenges-crystallography
The point is that the proverbial Wheel is still unfinished business
even in our own field of conventional crystallography on conventional
samples. Any belief that it is otherwise is just delaying progress,
the significance and benefits of which will only be realised after it
has taken place. I therefore have a different "prior prejudice" from
that expressed by Tim (and many others). Time will provide us with a
likelihood function to discriminate between them if, and only if,
scientifically motivated improvements to methods and software are
enabled to take place and are tested for their ability to make a
significant difference to some aspects of the downstream results.
There is of course a risk of wasted effort, but this is where I would
express my own very strong belief that there is not prospect of such
efforts being wasted. My view has always been that doubts that are
taken as justifications for *not* doing something should themselves be
subjected to a *double* dose of systematic doubt :-) .
With best wishes,
Gerard.
--
On Fri, Oct 23, 2015 at 02:06:43PM +0100, John R Helliwell wrote:
> Dear Tim,
> We have downloads' information published for MyTardis in Australia,ie
> see their article:-
> Acta Cryst. (2014). D70, 2510-2519
> doi:10.1107/S1399004714016174
> from which I quote:-
> "Current performance:- Since turning on the service in August 2013 and
> the first public announcement in November 2013, over 350 experiments
> were collected. Data pages were viewed over 1600 times and more than
> 500 downloads were requested."
>
> Updates on those download stats, and other raw diffraction data
> storage/archiving initiatives ongoing/starting in the USA and in
> Europe, would also be good to have, I agree.
>
> Greetings,
> John
>
> Emeritus Prof John R Helliwell DSc
>
>
> On 23 Oct 2015, at 12:40, Tim Gruene <[log in to unmask]> wrote:
>
> > Dear John,
> >
> > thank you for the link. I am indeed aware this has been discussed before, and
> > I took a look at some of the presentations where the title seemed promising.
> >
> > I am still just curious if someone had a more quantitative view than 'there
> > have been cases'.
> >
> > Best regards,
> > Tim
> >
> > On Friday, October 23, 2015 12:19:03 PM John R Helliwell wrote:
> >> Dear Tim,
> >> We have extensively debated this within CCP4bb in earlier threads, as you
> >> know of course.
> >> Suffice to say we are steadily learning about the benefits and the issues,
> >> including the options re being selective, re our raw data archiving.
> >> The recent IUCr Workshop held at the ECM29 in Rovinj on these matters,
> >> focusing on metadata for raw data, but also including an update on the
> >> wider aspects, is fully documented at :-
> >> http://www.iucr.org/resources/data/dddwg/rovinj-workshop
> >> ie including video and slides for each talk, which we hope you will find
> >> informative.
> >> Greetings,
> >> John
> >> Chair of the IUCr Diffraction Data Deposition Working Group.
> >>
> >> On Fri, Oct 23, 2015 at 10:16 AM, Tim Gruene <[log in to unmask]> wrote:
> >>> Dear all,
> >>>
> >>> I have wondered if it is really worth the effort (and disk space) for
> >>> central
> >>> long-term storage of diffraction images. What fraction of such data will
> >>> ever
> >>> be looked at in the future after the respective project has been
> >>> published?
> >>> Even if some revolutionary new technology would be developed, I guess this
> >>> would mostly be applied to current rather than old projects.
> >>> Given the substantial energy consumption of long term storage (including
> >>> DVDs
> >>> and tape as these have to be produced), the gross benefit might be greater
> >>> deleting old data at some point saving energy and effort for more current
> >>> things.
> >>>
> >>> I have been through a few disk crashs. Often I was annoyed because I had
> >>> to
> >>> reinstall a new computer, and sometimes I could not recover some data
> >>> which I
> >>> would have liked to. But in fact it often cleaned my computer and life
> >>> went on
> >>> even without access to whatever got lost.
> >>>
> >>> So what is the scientific argument behind long-term storage of diffraction
> >>> images other than academic interest in re-processing the data? As
> >>> mentioned
> >>> above, I guess that the benefit of re-processing the data may only be
> >>> minor
> >>> and effort might be better spent on concurrent projects.
> >>>
> >>> Best wishes,
> >>> Tim
> >>>
> >>> On Wednesday, October 21, 2015 06:03:21 PM Allister Crow wrote:
> >>>> On the last point about storing diffraction images, I wonder what the
> >>>> community's opinion is of uploading images to the Zenodo archive for
> >>>> safe-keeping and sharing?
> >>>>
> >>>> The Zenodo project is being run by the folks at CERN, and is EU funded
> >>>> to
> >>>> support scientific data sharing. (Zenodo.org)
> >>>>
> >>>> Until the PDB does this, perhaps this is one of the better ways through
> >>>> which we can ensure preservation (or at least another backup) of our
> >>>> most
> >>>> important diffraction images?
> >>>>
> >>>> - Ally
> >>>>
> >>>> ps I should also say that I originally learned of Zenodo from Graeme
> >>>
> >>> Winter
> >>>
> >>>> at Diamond.
> >>>>
> >>>> -----------------
> >>>> Allister Crow
> >>>> Department of Pathology
> >>>> University of Cambridge
> >>>> Google Scholar Profile <http://bit.ly/11ga7Sq>
> >>>> Research Gate Profile <http://bit.ly/137Ytt4>
> >>>> Departmental Page <http://www.path.cam.ac.uk/directory/allister-crow>
> >>>>
> >>>>> On 21 Oct 2015, at 17:03, William G. Scott <[log in to unmask]> wrote:
> >>>>>
> >>>>> Dear CCP4 Citizenry:
> >>>>>
> >>>>> I’m worried about medium to long-term data storage and integrity. At
> >>>
> >>> the
> >>>
> >>>>> moment, our lab uses mostly HFS+ formatted filesystems on our disks,
> >>>>> which is the OS X default. HFS+ always struck me as somewhat fragile,
> >>>>> and resource forks at best are a (seemingly needless) headache, at
> >>>
> >>> least
> >>>
> >>>>> as far as crystallography datasets go. (True, you can do
> >>>
> >>> HFS-compression
> >>>
> >>>>> and losslessly shrink your images by a factor of 2, or shrink your
> >>>>> ccp4
> >>>>> installation, but these are fairly minor advantages).
> >>>>>
> >>>>> I read the CCP4 wiki page
> >>>
> >>> http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Filesystems
> >>>
> >>>>> that summarizes some of the other options. From what I have read,
> >>>>> there
> >>>>> and elsewhere, it seems like zfs and btrfs might be significantly
> >>>
> >>> better
> >>>
> >>>>> alternatives to HFS+, but I really would like to get a sense of what
> >>>>> others have experienced with these, or other, equally or more robust
> >>>>> options. I don’t feel like I know enough to critically evaluate the
> >>>>> information.
> >>>>>
> >>>>> Anyone know what the NSA uses?
> >>>>>
> >>>>> I recently created a de novo backup of some personal data on an
> >>>
> >>> external
> >>>
> >>>>> HFS+ drive (photos, movies, music, etc). I was very unpleasantly
> >>>>> surprised to find several files had been silently corrupted. (In the
> >>>>> case of a movie file, for example, the file would play but could not
> >>>>> be
> >>>>> copied. In another case, a music file would not copy, yet it had
> >>>>> identical md5sum and sha1 checksums when compared to an uncorrupted
> >>>>> redundant backup I had. I’m still puzzled by this, but it suggests
> >>>>> the
> >>>>> resource fork might be the source of the corruption, and, more
> >>>
> >>> worrisome
> >>>
> >>>>> still, that conventional checksums aren’t detecting some silently
> >>>>> corrupted data, so I am not even sure if zfs self-healing would be the
> >>>>> answer.)
> >>>>>
> >>>>> Since we as a community are now encouraging primary X-ray diffraction
> >>>>> images to be stored, I can only imagine the problem could be
> >>>
> >>> ubiquitous,
> >>>
> >>>>> and a discussion might be worth having. (I apologize if this has been
> >>>>> addressed previously; I did search the archive.)
> >>>>>
> >>>>> All the best,
> >>>>>
> >>>>> Bill
> >>>>>
> >>>>>
> >>>>>
> >>>>> William G. Scott
> >>>>> Director, Program in Biochemistry and Molecular Biology
> >>>>> Professor, Department of Chemistry and Biochemistry
> >>>>> and The Center for the Molecular Biology of RNA
> >>>>> University of California at Santa Cruz
> >>>>> Santa Cruz, California 95064
> >>>>> USA
> >>>
> >>> --
> >>> --
> >>> Paul Scherrer Institut
> >>> Dr. Tim Gruene
> >>> - persoenlich -
> >>> OFLC/102
> >>> CH-5232 Villigen PSI
> >>> phone: +41 (0)56 310 5297
> >>>
> >>> GPG Key ID = A46BEE1A
> >>
> >> --
> >> Professor John R Helliwell DSc
> > --
> > --
> > Paul Scherrer Institut
> > Dr. Tim Gruene
> > - persoenlich -
> > OFLC/102
> > CH-5232 Villigen PSI
> > phone: +41 (0)56 310 5297
> >
> > GPG Key ID = A46BEE1A
> >
--
===============================================================
* *
* Gerard Bricogne [log in to unmask] *
* *
* Global Phasing Ltd. *
* Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
* Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 *
* *
===============================================================
|