Dear Randy
TL; DR: yep agree with Jacob, however this turns out to be more complicated
Long version:
On behalf of the DIALS people - yes, this is one of the things we are aiming at and to date have made good progress with. You are completely correct that a reasonable number of "strong" reflections are needed to get the indexing and refinement to work, and to date we have been demonstrating the efficacy of the method in DIALS more towards extending the resolution with high multiplicity weak data (i.e. there are some strong-ish spots at low resolution to use for indexing etc.)
There are however some additional problems with extremely weak high multiplicity data, which I have been meaning to respond to this thread about, so here we go. Here follows opinions I look forward to debating!
Jacob's assertion (that the readout noise is 0 so spreading the data out across more images has no penalty) is broadly correct iff our primary source of ignorance* is the detector readout noise. However, when this noise is 0 all that happens is the next greatest source of ignorance steps up to the plate to cause us problems - for example, what exactly should the background be if the pixel values are all 0, with a couple of 1's? Variance on a pixel with 0 counts? Also, how should you get good scale factors out when the data are very weak, as the scale factors are themselves derived from the data. These challenges are not insurmountable and they are well within scope for DIALS as well as improvements in e.g. XDS, but they require a good understanding of the statistics of all of the processes and may break assumptions (e.g. sig(I) is Normal) that were made when existing scaling programs were written about e.g. the error models for the data. One element of this is to correctly interpret high multiplicity weak data the correct scaling / weight model to use for merging and sig(I) estimation, which is currently a rigidly defined area of doubt and uncertainty, as it should incorporate all sources of ignorance including but not limited to shot noise, scaling models, radiation damage, ...
One thought experiment I was considering lately was the design of data collection strategies to deliberately use the sample to minimise our ignorance of the experiment i.e. to measure the data not for good I, sig(I) estimates but instead for good absorption, intensity, detector sensitivity etc. scaling models, using the symmetry in the intensity pattern to define constraints on these models and then to minimise the errors in these models. The side effect of this would be (hopefully) a well measured set of I, sig(I) values *but also* a good set of corrections to apply to these so that the data going downstream are better. This kind of experiment could be achieved easily with a big flat beam, PAD, multi-axis goniometer and so on and is actually routinely performed by small molecule crystallographers. I suspect that this holistic view of the experiment would make the average quality of data measured at facilities improve as well as making good use of all the new experimental hardware which is now available. Clearly though in the *vast* majority of cases the way we measure data currently is good enough for what people need and gets science done!
Look forward to discussing this, cheerio Graeme
---------------------------------
*I use the term ignorance to mean "that which we do not know" which is hopefully reduced by performing an experiment and carefully processing data, however never eliminated.
From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of Randy Read
Sent: 04 November 2015 11:39
To: ccp4bb
Subject: Re: [ccp4bb] New Rule for PADs
Just to fill a little bit of the silence...
I agree that, in principle, with a detector with zero readout noise, there should be no penalty for spreading the photons over more frames. My understanding is that the DIALS people are working actively towards achieving this theoretical objective, which would have the kinds of benefits you describe (e.g. you can detect outliers more readily and you don't need to anticipate how long your crystal will last). However, in practice, you can run into problems. For instance, indexing and post-refinement require detecting spots on the detector. If the algorithm for detecting spots only looks at single images, and if you reduce the photons per image, you eventually reach a point where no spots are detected. So spot detection algorithms somehow have to be updated. Similar issues apply to determining accurate reflection profiles.
Anyway, while there's a gap between theory and practice, it would be best not to reduce the photons per image too far. Perhaps someone from the DIALS team could comment on how far one might go with current software?
Best wishes,
Randy Read
On 3 Nov 2015, at 10:18, Jonathan Davies <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Thanks Jacob
-------- Forwarded Message --------
Subject:
RE: [ccp4bb] New Rule for PADs
Date:
Tue, 3 Nov 2015 01:05:19 +0000
From:
Keller, Jacob <[log in to unmask]><mailto:[log in to unmask]>
To:
Jonathan Davies <[log in to unmask]><mailto:[log in to unmask]>
>Has there been any further discussion on this?
Only a resounding silence...!
>I don't fully understand why one would require such a high multiplicity, would there be any significant difference between a dataset with a multiplicity of 100 compared to one with a multiplicity of 20 say, or even 10 (apart from specific cases such as sulphur SAD)?
I was thinking that for estimations of errors, which can be important, this would be very good.
>Would the attenuation also possibly affect the resolution, i.e. worse signal to noise in high resolution shell?
No, not at all, and this is exactly my point. With PADs, there is zero readout noise, so it does not matter whether you collect your photons in 10 frames or 1000 frames: the signal is the same. The benefit is huge, however, in that reciprocal space is sampled evenly as a function of radiation dose, whereas in the usual method, crystals are damaged by the time the dataset reaches full completeness.
Thanks for your interest-you could post this to the list, and it might engender some interesting discussion.
Jacob
On 26/10/15 19:35, Keller, Jacob wrote:
How about a new rule for data collected on pixel area detectors (Pilatus etc):
Attenuate to ensure multiplicity/redundancy greater than 100?
JPK
*******************************************
Jacob Pearson Keller, PhD
Looger Lab/HHMI Janelia Research Campus
19700 Helix Dr, Ashburn, VA 20147
email: [log in to unmask]<mailto:[log in to unmask]>
*******************************************
------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: + 44 1223 336500
Wellcome Trust/MRC Building Fax: + 44 1223 336827
Hills Road E-mail: [log in to unmask]<mailto:[log in to unmask]>
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk<http://www-structmed.cimr.cam.ac.uk>
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
|