Please ! Doing what is suggested is simply making up data - lazily. That's what is wrong with it ! The 'significance' is premised on real random samples - which would be violently violated. Klim Klim McPherson Phd FFPH FMedSci Visiting Professor of Public Health Epidemiology Nuffield Dept Obs & Gynae & New College University of Oxford Mobile 007711335993 On 03/08/2011 08:26, "Ted Harding" <[log in to unmask]> wrote: >On 03-Aug-11 01:25:54, jo kirkpatrick wrote: >> Please forgive what might be a really dumb suggestion but >> could we magnify the significance of say a T-Test by feeding >> the same 12 results through 4 or 5 times? Please don't all >> scream at once, I am only an MSc student! >> >> Best wishes Jo >> [The rest of the inclusions snipped] > >Jo, >If by this you mean stringing a set of 12 results together with >itself (say) 5 times, and then feeding the resulting 60 data >values into a t-test, then the answer is that you will indeed >magnify the significance! > >The basic reason is that the sample mean of the 60 will be the >same as the sample mean of the 12, while the sample Standard >Error of the mean will be 1/sqrt(5) times that of the 12. > >Hence the t-value for the 60 will be sqrt(5) = 2.236 times >the t-value for the 12. So if, say, your t-value for the 12 >was 1.36343 (on 11 degrees of freedom) so that the 2-sided >P-value was then 0.20 (rather disappointing ... ), then if >you did the above you would get a t-value of 3.048722, and >the t-test procedure (being unaware of your deviousness) >would treat this as having 59 degrees of freedom, with the >resulting P-value then being 0.0034 which is much more >satisfying! > >Your question is not as "dumb" as it might at first seem. >While it is clearly invalid to create a large dataset by >chaining together replicates of a small one, until you get >one large enough to give you an extreme P-value, this is >not grossly different from going back to the population >again and again, repeatedly sampling 12 each time until >you again get the desired result. > >This is because, if the initial 12 were a fair sample, >future samples of 12 are unlikely to be grossly dissimilar >to the initial 12. So sooner or later (and with reference >to the above example probably with around 5 repetitions) >you could move from P=0.2 to P < 0.01 by repeated sampling. > >The aggregate sample at any stage is then a valid sample >of that size from the population, as opposed to the invalid >"sample" generated by recycling the original small one. > >What is invalid about the procedure is the intention to >keep going until you get a small enough P-value. This >will inevitably occur if you keep going long enough. > >No Null Hypothesis is ever exactly true in real life. >If it is off by some small amount, then a large enough >sample (and you may need a very large one) will almost >surely result in a P-value smaller than your target. > >The real question is: How far off is it? Is this difference >of any interest? This leads on to the question: If the >smallest difference which is of practical interest is, >say, D, then how large a sample would we need in order >to have a good chance of a significsant P-value if the >true difference were at least D? > >Also, the "How far off is it?" question can be addressed >by looking at a confidence interval for the difference. >Such broader approaches should always be used, rather >than simplistic reliance on mere P-values. > >Hoping this helps! >Ted. > >-------------------------------------------------------------------- >E-Mail: (Ted Harding) <[log in to unmask]> >Fax-to-email: +44 (0)870 094 0861 >Date: 03-Aug-11 Time: 08:26:20 >------------------------------ XFMail ------------------------------