Hi Folks,

Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with!

Best wishes Graeme

===================================

Proposal 1:

10% reflections, max 2000

Proposal 2: from wiki:

http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set

including Randy Read "recipe":

So here's the recipe I would use, for what it's worth:

<10000 reflections: set aside 10%

10000-20000 reflections: set aside 1000 reflections

20000-40000 reflections: set aside 5%

>40000 reflections: set aside 2000 reflections

Proposal 3:

5% maximum 2-5k

Proposal 4:

3% minimum 1000

Proposal 5:

5-10% of reflections, minimum 1000

Proposal 6:

> 50 reflections per "bin" in order to get reliable ML parameter estimation, ideally around 150 / bin.

Proposal 7:

If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of > 5k reflections as test set.

Comment 1 in response to this:

Surely absolute # of test reflections is not relevant, percentage is.

============================

Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else.

On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter <[log in to unmask]> wrote:

Hi Folks

Had a vague comment handed my way that "xia2 assigns too many free reflections" - I have a feeling that by default it makes a free set of 5% which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems excessive now.

This was particularly in the case of high resolution data where you have a lot of reflections, so 5% could be several thousand which would be more than you need to just check Rfree seems OK.

Since I really don't know what is the right # reflections to assign to a free set thought I would ask here - what do you think? Essentially I need to assign a minimum %age or minimum # - the lower of the two presumably?

Any comments welcome!

Thanks & best wishes Graeme