Hi Robbie, thank you for the explanation. Heinz Gut and Michael Hadders pointed me at Axel Brunger's publication Methods Enzymol. 1997;277:366-96., http://www.ncbi.nlm.nih.gov/pubmed/18488318, which is where I got the notion of 500-1000 from. In this article a decrease of the error margin of Rfree with n^(1/2) is mentioned (p.384), but only as an observation. Is your statement "inverse proportional with the number of reflections" based on some statistical treatment, or also just on observation? It is a pity that k-cross validation is not standard routine because it seems so easy and so quickly to do with nowadays computers and a simple script. But that's probably like reminding people of not using R_int anymore in favour of R_meas... Cheers, Tim On Tue, Mar 26, 2013 at 10:24:51AM +0100, Robbie Joosten wrote: > Hi Tim, > > I don't think the 5-10% or 500-1000 reflections are real rules, but rather > practical choices. The error margin in R-free is inverse proportional with > the number of reflections in your test set and also proportional with R-free > itself. So for R-free to be 'significant' you need some absolute number of > reflections to reach your cut-off of significance. This is where the 1000 > comes from (500 is really pushing the limit). > You want to make sure the error margin in R and R-free are not too far apart > and you probably also want to keep the test set representative of the whole > data set (this is particularly important because we use hold-out validation, > you only get one shot at validating). This is where the 5%-10% comes from. > Another consideration for going for the 5%-10% thing is that this makes it > feasible to do 'full' (i.e. k-fold) cross-validation: you only have to do > 20-10 refinements. If you would go for 1000 reflections you would have to > do 48 refinements for the average dataset. > > Personally, I take 5% and increase this percentage to maximum 10% if using > 5% gives me a test set smaller than 1000 reflections. > > HTH, > Robbie > > > -----Original Message----- > > From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of > > Tim Gruene > > Sent: Tuesday, March 26, 2013 09:33 > > To: [log in to unmask] > > Subject: [ccp4bb] Rfree reflections > > > > Dear all, > > > > I recall that the set of Rfree reflections should be 500-1000, rather than > 5- > > 10%, but I cannot find the reference for it (maybe Ian Tickle?). > > > > I would therefore like to be confirmed or corrected: > > > > Is there an absolute number required for Rfree to be significant, i.e. > 500-1000 > > irrespective of the total number of unique reflections in the data set, or > is it > > 5-10% (as a compromise)? > > > > Thanks and regards, > > Tim > > > > -- > > -- > > Dr Tim Gruene > > Institut fuer anorganische Chemie > > Tammannstr. 4 > > D-37077 Goettingen > > > > GPG Key ID = A46BEE1A > -- -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A