Dear all

Thanks for the replies, especially to Ted who took the trouble to look beyond the apparent inanity of the question. BTW I was not suggesting any deception or using the 'results' gained this way as 'Results'; or doing this to save the time and or trouble of recruiting a larger sample. It was simply to see if there was a way that a researcher could satisfy her/his own curiosity regarding the potential validity when confined for whatever reason to a small sample.

The original questioner was limited to a sample of 12 and wondered about the value of the findings. There are many occasions when small samples have to be tollerated perhaps through circumstances such as testing a new treatment for a rare condition, or lack of time or funding. Perhaps doing this and of course being honest about it could help them to get funding for further human trials or computer extrapolations.

Best wishes Jo

From: Klim McPherson <[log in to unmask]>
To: [log in to unmask]
Sent: Wed, 3 August, 2011 9:19:49
Subject: Re: Sample Size Question

Please ! Doing what is suggested is simply making up data - lazily.

That's what is wrong with it !

The 'significance' is premised on real random samples - which would be
violently violated.

Klim

Klim McPherson Phd FFPH FMedSci
Visiting Professor of Public Health Epidemiology
Nuffield Dept Obs & Gynae & New College
University of Oxford
Mobile 007711335993

On 03/08/2011 08:26, "Ted Harding" <[log in to unmask]> wrote:

>On 03-Aug-11 01:25:54, jo kirkpatrick wrote:
>> Please forgive what might be a really dumb suggestion but
>> could we magnify the significance of say a T-Test by feeding
>> the same 12 results through 4 or 5 times? Please don't all
>> scream at once, I am only an MSc student!
>>
>> Best wishes Jo
>> [The rest of the inclusions snipped]
>
>Jo,
>If by this you mean stringing a set of 12 results together with
>itself (say) 5 times, and then feeding the resulting 60 data
>values into a t-test, then the answer is that you will indeed
>magnify the significance!
>
>The basic reason is that the sample mean of the 60 will be the
>same as the sample mean of the 12, while the sample Standard
>Error of the mean will be 1/sqrt(5) times that of the 12.
>
>Hence the t-value for the 60 will be sqrt(5) = 2.236 times
>the t-value for the 12. So if, say, your t-value for the 12
>was 1.36343 (on 11 degrees of freedom) so that the 2-sided
>P-value was then 0.20 (rather disappointing ... ), then if
>you did the above you would get a t-value of 3.048722, and
>the t-test procedure (being unaware of your deviousness)
>would treat this as having 59 degrees of freedom, with the
>resulting P-value then being 0.0034 which is much more
>satisfying!
>
>Your question is not as "dumb" as it might at first seem.
>While it is clearly invalid to create a large dataset by
>chaining together replicates of a small one, until you get
>one large enough to give you an extreme P-value, this is
>not grossly different from going back to the population
>again and again, repeatedly sampling 12 each time until
>you again get the desired result.
>
>This is because, if the initial 12 were a fair sample,
>future samples of 12 are unlikely to be grossly dissimilar
>to the initial 12. So sooner or later (and with reference
>to the above example probably with around 5 repetitions)
>you could move from P=0.2 to P < 0.01 by repeated sampling.
>
>The aggregate sample at any stage is then a valid sample
>of that size from the population, as opposed to the invalid
>"sample" generated by recycling the original small one.
>
>What is invalid about the procedure is the intention to
>keep going until you get a small enough P-value. This
>will inevitably occur if you keep going long enough.
>
>No Null Hypothesis is ever exactly true in real life.
>If it is off by some small amount, then a large enough
>sample (and you may need a very large one) will almost
>surely result in a P-value smaller than your target.
>
>The real question is: How far off is it? Is this difference
>of any interest? This leads on to the question: If the
>smallest difference which is of practical interest is,
>say, D, then how large a sample would we need in order
>to have a good chance of a significsant P-value if the
>true difference were at least D?
>
>Also, the "How far off is it?" question can be addressed
>by looking at a confidence interval for the difference.
>Such broader approaches should always be used, rather
>than simplistic reliance on mere P-values.
>
>Hoping this helps!
>Ted.
>
>--------------------------------------------------------------------
>E-Mail: (Ted Harding) <[log in to unmask]>
>Fax-to-email: +44 (0)870 094 0861
>Date: 03-Aug-11 Time: 08:26:20
>------------------------------ XFMail ------------------------------