Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with!
Proposal 1:
10% reflections, max 2000
Proposal 2: from wiki:
including Randy Read "recipe":
So here's the recipe I would use, for what it's worth:
<10000 reflections: set aside 10%
10000-20000 reflections: set aside 1000 reflections
20000-40000 reflections: set aside 5%
>40000 reflections: set aside 2000 reflections
Proposal 3:
5% maximum 2-5k
Proposal 4:
3% minimum 1000
Proposal 5:
5-10% of reflections, minimum 1000
Proposal 6:
> 50 reflections per "bin" in order to get reliable ML parameter estimation, ideally around 150 / bin.
Proposal 7:
If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of > 5k reflections as test set.
Comment 1 in response to this:
Surely absolute # of test reflections is not relevant, percentage is.
============================
Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else.