in a data set with just below 800,000 independent reflections I use 1 % for freeR which
is still impressive 8,000. xia2 would have assigned 40,000 for freeR
at 5 %. I think this is way too much.

I think it is relative, isn't it? That is 100 out of 1,000 is the same as 1,000 out of 10,000 or 10,000 out of 100,000. As long as you maintain the condition that each thin resolution shell contains not less than say 50-150 reflections (so that ML refinement can work properly), and the fraction of free set is 5-10%, the absolute number of free reflections is irrelevant.

Pavel