For teaching purposes I have found that controlled pairs of data sets
are most instructive. You are right that an easy one-button-push
processing run tells you nothing, but so does a bang-it-crashed-now-what
data set. Most useful are two data sets that are identical in every
respect but one, and that one thing is the point you are trying to get
across. It's hard to collect such perfectly paired data sets, so I
ended up just simulating them. I deliberately chose a high-symmetry
space group to keep the download size small. You can download them from
here:
http://bl831.als.lbl.gov/~jamesh/workshop/
These five datasets represent the four biggest problems I see users have
when trying to solve structures: 1) poor anomalous signal, 2) overlaps
from a bad crystal orientation, 3) hidden radiation damage to sites, and
4) ice rings. The 5th "goodsignal" dataset is the positive control.
The web page contains everything from images to processed MTZ files,
maps and the "right answer" in pdb and mtz format. A slightly more
"realistic" version with a bigger download size is here:
http://bl831.als.lbl.gov/~jamesh/workshop2/
This is the one I used for my "weak anomalous challenge" a few years
back. The teaching advantage is that you can use the image-mixer script
to modulate the severity of problems like ice rings and anomalous
signal. If you make a competition of it, people tend to get more
interested.
When it comes to beam centers, it is not all that hard to take a data
set with a "correct" beam center and just edit the headers. How you do
this depends on the file format, but I have some instructions for
editing images in general here:
http://bl831.als.lbl.gov/~jamesh/bin_stuff/
In general, you can usually separate the header from the data with the
unix command "head" or "dd", edit the header with your favorite text
editor, and then put the two parts back together with "cat". As for
which beam center is "correct", it is important to tell your students
that that depends on which software you are using. I wrote all this
down in the last paragraph on page 7 of this doc:
https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965
This doc also describes another simulated data set that demonstrates the
challenges of combining lots of short wedges together. May or may not
be too advanced a topic for your students? Or maybe not. As you can
guess I'm experimenting with biorxiv. So far, no comments.
Good luck with your class!
-James Holton
MAD Scientist
On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
> Dear colleagues,
>
> For teaching purposes, I am looking for a small number (< 5) of
> macromolecular diffraction datasets (raw images) that might be
> considered 'difficult' for a beginning crystallography student to
> process. By 'difficult' I generally mean not able to be processed
> automatically by a common processing package (XDS, Mosflm, DIALS, etc)
> using default settings, i.e., no black box "click and done" processing.
> The datasets I am looking for would have some stumbling block such as
> incorrect experimental parameters recorded in the image headers,
> multiple lattices that cause indexing to fail, datasets for which
> determining the correct space group is tricky, datasets for experiments
> in which the crystal slipped or moved in the beam, or anything else you
> can think of. The idea is for these beginning students to examine
> several datasets that highlight various phenomena that can lead one
> astray during processing.
>
> A good candidate dataset would also ideally comprise a modest number of
> images so as to keep integration time to a minimum. Factors that are
> mostly irrelevant for my purpose: resolution (as long as better than
> ~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
> scattering, presence/absence of ligands, monomeric vs oligomeric
> structures, etc. Also, to be clear, I am not looking for datasets that
> have so many pathologies that they would require many long hours of work
> for an expert to process correctly.
>
> I have checked public repositories such as proteindiffraction.org and
> SBGrid databank, but all of the datasets I acquired from these sources
> process satisfactorily with little effort, and in any event I know of no
> way to search for 'challenging' datasets. (I also wonder whether
> anybody is in the habit of depositing, shall we say, less-than-pristine
> images to public repositories?)
>
> If you know of such a dataset that is already publicly available, or if
> you have such a dataset that you are willing to share for solely
> educational purposes, I would appreciate hearing from you, either on- or
> off-list.
>
> Thank you in advance for your suggestions.
>
> Matthew
>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
|