Graeme, one more suggestion. You can avoid all the recipes by use all data for WORK set and 0 reflections for TEST set regardless of the amount of data by using the FREE KICK ML target. For explanation see our recent paper Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML” best, dusan > On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system <[log in to unmask]> wrote: > > Date: Thu, 4 Jun 2015 08:30:57 +0000 > From: Graeme Winter <[log in to unmask]> > Subject: Re: How many is too many free reflections? > > Hi Folks, > > Many thanks for all of your comments - in keeping with the spirit of the BB > I have digested the responses below. Interestingly I suspect that the > responses to this question indicate the very wide range of resolution > limits of the data people work with! > > Best wishes Graeme > > =================================== > > Proposal 1: > > 10% reflections, max 2000 > > Proposal 2: from wiki: > > http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set > > including Randy Read "recipe": > > So here's the recipe I would use, for what it's worth: > <10000 reflections: set aside 10% > 10000-20000 reflections: set aside 1000 reflections > 20000-40000 reflections: set aside 5% >> 40000 reflections: set aside 2000 reflections > > Proposal 3: > > 5% maximum 2-5k > > Proposal 4: > > 3% minimum 1000 > > Proposal 5: > > 5-10% of reflections, minimum 1000 > > Proposal 6: > >> 50 reflections per "bin" in order to get reliable ML parameter > estimation, ideally around 150 / bin. > > Proposal 7: > > If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be > 40k i.e. rather a lot. Referees question use of > 5k reflections as test > set. > > Comment 1 in response to this: > > Surely absolute # of test reflections is not relevant, percentage is. > > ============================ > > Approximate consensus (i.e. what I will look at doing in xia2) - probably > follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy > most of the criteria raised by everyone else. > > > > On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter <[log in to unmask]> > wrote: > >> Hi Folks >> >> Had a vague comment handed my way that "xia2 assigns too many free >> reflections" - I have a feeling that by default it makes a free set of 5% >> which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems >> excessive now. >> >> This was particularly in the case of high resolution data where you have a >> lot of reflections, so 5% could be several thousand which would be more >> than you need to just check Rfree seems OK. >> >> Since I really don't know what is the right # reflections to assign to a >> free set thought I would ask here - what do you think? Essentially I need >> to assign a minimum %age or minimum # - the lower of the two presumably? >> >> Any comments welcome! >> >> Thanks & best wishes Graeme >> > Dr. Dusan Turk, Prof. Head of Structural Biology Group http://bio.ijs.si/sbl/ Head of Centre for Protein and Structure Production Centre of excellence for Integrated Approaches in Chemistry and Biology of Proteins, Scientific Director http://www.cipkebip.org/ Professor of Structural Biology at IPS "Jozef Stefan" e-mail: [log in to unmask] phone: +386 1 477 3857 Dept. of Biochem.& Mol.& Struct. Biol. fax: +386 1 477 3984 Jozef Stefan Institute Jamova 39, 1 000 Ljubljana,Slovenia Skype: dusan.turk (voice over internet: www.skype.com