Graeme,
one more suggestion. You can avoid all the recipes by use all data for WORK set and 0 reflections for TEST set regardless of the amount of data by using the FREE KICK ML target. For explanation see our recent paper Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134.
Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML”
best,
dusan
> On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system <[log in to unmask]> wrote:
>
> Date: Thu, 4 Jun 2015 08:30:57 +0000
> From: Graeme Winter <[log in to unmask]>
> Subject: Re: How many is too many free reflections?
>
> Hi Folks,
>
> Many thanks for all of your comments - in keeping with the spirit of the BB
> I have digested the responses below. Interestingly I suspect that the
> responses to this question indicate the very wide range of resolution
> limits of the data people work with!
>
> Best wishes Graeme
>
> ===================================
>
> Proposal 1:
>
> 10% reflections, max 2000
>
> Proposal 2: from wiki:
>
> http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set
>
> including Randy Read "recipe":
>
> So here's the recipe I would use, for what it's worth:
> <10000 reflections: set aside 10%
> 10000-20000 reflections: set aside 1000 reflections
> 20000-40000 reflections: set aside 5%
>> 40000 reflections: set aside 2000 reflections
>
> Proposal 3:
>
> 5% maximum 2-5k
>
> Proposal 4:
>
> 3% minimum 1000
>
> Proposal 5:
>
> 5-10% of reflections, minimum 1000
>
> Proposal 6:
>
>> 50 reflections per "bin" in order to get reliable ML parameter
> estimation, ideally around 150 / bin.
>
> Proposal 7:
>
> If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be
> 40k i.e. rather a lot. Referees question use of > 5k reflections as test
> set.
>
> Comment 1 in response to this:
>
> Surely absolute # of test reflections is not relevant, percentage is.
>
> ============================
>
> Approximate consensus (i.e. what I will look at doing in xia2) - probably
> follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy
> most of the criteria raised by everyone else.
>
>
>
> On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter <[log in to unmask]>
> wrote:
>
>> Hi Folks
>>
>> Had a vague comment handed my way that "xia2 assigns too many free
>> reflections" - I have a feeling that by default it makes a free set of 5%
>> which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems
>> excessive now.
>>
>> This was particularly in the case of high resolution data where you have a
>> lot of reflections, so 5% could be several thousand which would be more
>> than you need to just check Rfree seems OK.
>>
>> Since I really don't know what is the right # reflections to assign to a
>> free set thought I would ask here - what do you think? Essentially I need
>> to assign a minimum %age or minimum # - the lower of the two presumably?
>>
>> Any comments welcome!
>>
>> Thanks & best wishes Graeme
>>
>
Dr. Dusan Turk, Prof.
Head of Structural Biology Group http://bio.ijs.si/sbl/
Head of Centre for Protein and Structure Production
Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director
http://www.cipkebip.org/
Professor of Structural Biology at IPS "Jozef Stefan"
e-mail: [log in to unmask]
phone: +386 1 477 3857 Dept. of Biochem.& Mol.& Struct. Biol.
fax: +386 1 477 3984 Jozef Stefan Institute
Jamova 39, 1 000 Ljubljana,Slovenia
Skype: dusan.turk (voice over internet: www.skype.com
|