Hi Tim,
I don't think the 5-10% or 500-1000 reflections are real rules, but rather
practical choices. The error margin in R-free is inverse proportional with
the number of reflections in your test set and also proportional with R-free
itself. So for R-free to be 'significant' you need some absolute number of
reflections to reach your cut-off of significance. This is where the 1000
comes from (500 is really pushing the limit).
You want to make sure the error margin in R and R-free are not too far apart
and you probably also want to keep the test set representative of the whole
data set (this is particularly important because we use hold-out validation,
you only get one shot at validating). This is where the 5%-10% comes from.
Another consideration for going for the 5%-10% thing is that this makes it
feasible to do 'full' (i.e. k-fold) cross-validation: you only have to do
20-10 refinements. If you would go for 1000 reflections you would have to
do 48 refinements for the average dataset.
Personally, I take 5% and increase this percentage to maximum 10% if using
5% gives me a test set smaller than 1000 reflections.
HTH,
Robbie
> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> Tim Gruene
> Sent: Tuesday, March 26, 2013 09:33
> To: [log in to unmask]
> Subject: [ccp4bb] Rfree reflections
>
> Dear all,
>
> I recall that the set of Rfree reflections should be 500-1000, rather than
5-
> 10%, but I cannot find the reference for it (maybe Ian Tickle?).
>
> I would therefore like to be confirmed or corrected:
>
> Is there an absolute number required for Rfree to be significant, i.e.
500-1000
> irrespective of the total number of unique reflections in the data set, or
is it
> 5-10% (as a compromise)?
>
> Thanks and regards,
> Tim
>
> --
> --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
>
> GPG Key ID = A46BEE1A
|