Print

Print


Graeme,
one more suggestion. You can avoid all the recipes by use all data for WORK set and 0 reflections for TEST set regardless of the amount of data by using the FREE KICK ML target. For explanation see our recent paper Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. 

Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML”

best,
dusan

 

> On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system <[log in to unmask]> wrote:
> 
> Date:    Thu, 4 Jun 2015 08:30:57 +0000
> From:    Graeme Winter <[log in to unmask]>
> Subject: Re: How many is too many free reflections?
> 
> Hi Folks,
> 
> Many thanks for all of your comments - in keeping with the spirit of the BB
> I have digested the responses below. Interestingly I suspect that the
> responses to this question indicate the very wide range of resolution
> limits of the data people work with!
> 
> Best wishes Graeme
> 
> ===================================
> 
> Proposal 1:
> 
> 10% reflections, max 2000
> 
> Proposal 2: from wiki:
> 
> http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set
> 
> including Randy Read "recipe":
> 
> So here's the recipe I would use, for what it's worth:
>  <10000 reflections:        set aside 10%
>   10000-20000 reflections:  set aside 1000 reflections
>   20000-40000 reflections:  set aside 5%
>> 40000 reflections:        set aside 2000 reflections
> 
> Proposal 3:
> 
> 5% maximum 2-5k
> 
> Proposal 4:
> 
> 3% minimum 1000
> 
> Proposal 5:
> 
> 5-10% of reflections, minimum 1000
> 
> Proposal 6:
> 
>> 50 reflections per "bin" in order to get reliable ML parameter
> estimation, ideally around 150 / bin.
> 
> Proposal 7:
> 
> If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be
> 40k i.e. rather a lot. Referees question use of > 5k reflections as test
> set.
> 
> Comment 1 in response to this:
> 
> Surely absolute # of test reflections is not relevant, percentage is.
> 
> ============================
> 
> Approximate consensus (i.e. what I will look at doing in xia2) - probably
> follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy
> most of the criteria raised by everyone else.
> 
> 
> 
> On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter <[log in to unmask]>
> wrote:
> 
>> Hi Folks
>> 
>> Had a vague comment handed my way that "xia2 assigns too many free
>> reflections" - I have a feeling that by default it makes a free set of 5%
>> which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems
>> excessive now.
>> 
>> This was particularly in the case of high resolution data where you have a
>> lot of reflections, so 5% could be several thousand which would be more
>> than you need to just check Rfree seems OK.
>> 
>> Since I really don't know what is the right # reflections to assign to a
>> free set thought I would ask here - what do you think? Essentially I need
>> to assign a minimum %age or minimum # - the lower of the two presumably?
>> 
>> Any comments welcome!
>> 
>> Thanks & best wishes Graeme
>> 
> 

Dr. Dusan Turk, Prof.
Head of Structural Biology Group http://bio.ijs.si/sbl/ 
Head of Centre for Protein  and Structure Production
Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director
http://www.cipkebip.org/
Professor of Structural Biology at IPS "Jozef Stefan"
e-mail: [log in to unmask]    
phone: +386 1 477 3857       Dept. of Biochem.& Mol.& Struct. Biol.
fax:   +386 1 477 3984       Jozef Stefan Institute
                            Jamova 39, 1 000 Ljubljana,Slovenia
Skype: dusan.turk (voice over internet: www.skype.com