Fair enough!
I have just now added DANO and I(+)/I(-) to the files. I'll
be very interested to see what you can come up with! For the
record, the phases therein came from running mlphare with
default parameters but exactly the correct heavy-atom
constellation (all the sulfur atoms in 3dko), and then running
dm with default parameters.
Yes, there are other ways to run mlphare and dm that give
better phases, but I was only able to determine those
parameters by "cheating" (comparing the resulting map to the
right answer), so I don't think it is "fair" to use those
maps.
I have had a few questions about what is "cheating" and what
is not cheating. I don't have a problem with the use of
sequence information because that actually is something that
you realistically would know about your protein when you sat
down to collect data. The sequence of this molecule is that
of 3dko:
http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir
I also don't have a problem with anyone actually using an
automation program to _help_ them solve the "impossible"
dataset as long as they can explain what they did. Simply
putting the above sequence into BALBES would, of course, be
cheating! I suppose one could try eliminating 3dko and its
"homologs" from the BALBES search, but that, in and of itself,
is perhaps relevant to the challenge: "what is the most
distance homolog that still allows you to solve the
structure?". That, I think, is also a stringent test of
model-building skill.
I have already tried ARP/wARP, phenix.autobuild and
buccaneer/refmac. With default parameters, all of these
programs fail on both the "possible" and "impossible"
datasets. It was only with some substantial tweaking that I
found a way to get phenix.autobuild to crack the "possible"
dataset (using 20 models in parallel). I have not yet found a
way to get any automation program to build its way out of the
"impossible" dataset. Personally, I think that the
breakthrough might be something like what Tom Terwilliger
mentioned. If you build a good enough starting set of atoms,
then I think an automation program should be able to take you
the rest of the way. If that is the case, then it means
people like Tom who develop such programs for us might be able
to use that insight to improve the software, and that is
something that will benefit all of us.
Or, it is entirely possible that I'm just not running the
current software properly! If so, I'd love it if someone who
knows better (such as their developers) could enlighten me.
-James Holton
MAD Scientist
On 1/12/2013 3:07 AM, Pavol Skubak wrote:
[log in to unmask]"
type="cite">
Dear James,
your challenge in its current form ignores an important
source
of information for model building that is available for
your
simulated data - namely, it does not allow
to use anomalous
phase information in the model building. In difficult
cases on
the edge of success such as this one, this typically
makes
the difference between building and not building.
If you can make the F+/F- and Se substructure available,
we
can test whether this is the case indeed. However, while
I
expect this would push the challenge
further significantly,
most likely you would be able to decrease the Se
incorporation
of your simulated data further to such levels that the
anomalous
signal is again no longer sufficient to build the
structure. And
most likely, there would again exist an edge where a
small
decrease in the Se incorporation would lead from a model
built
to no model built.
Best regards,