Dear Klim,
It is possible you may have misunderstood the intent of the query. I don't
think that was the reason the question was asked. The idea was
mathematically what could this do to the data and this is important as Ted
suggests because sample size and efforts to obtain the same are not as clear
cut as it would seem. Real and random are not black and white and it is
important for a student to understand how probability works. Making up data
however creatively is unacceptable ethically and obviously any student that
would take the time to write a list serve would not be asking a list on
evidence based health for ways to falsify data.
Best regards,
Amy
Amy Price PhD
Http://empower2go.org
Building Brain Potential
-----Original Message-----
From: Evidence based health (EBH)
[mailto:[log in to unmask]] On Behalf Of Klim McPherson
Sent: 03 August 2011 04:20 AM
To: [log in to unmask]
Subject: Re: Sample Size Question
Please ! Doing what is suggested is simply making up data - lazily.
That's what is wrong with it !
The 'significance' is premised on real random samples - which would be
violently violated.
Klim
Klim McPherson Phd FFPH FMedSci
Visiting Professor of Public Health Epidemiology
Nuffield Dept Obs & Gynae & New College
University of Oxford
Mobile 007711335993
On 03/08/2011 08:26, "Ted Harding" <[log in to unmask]> wrote:
>On 03-Aug-11 01:25:54, jo kirkpatrick wrote:
>> Please forgive what might be a really dumb suggestion but
>> could we magnify the significance of say a T-Test by feeding
>> the same 12 results through 4 or 5 times? Please don't all
>> scream at once, I am only an MSc student!
>>
>> Best wishes Jo
>> [The rest of the inclusions snipped]
>
>Jo,
>If by this you mean stringing a set of 12 results together with
>itself (say) 5 times, and then feeding the resulting 60 data
>values into a t-test, then the answer is that you will indeed
>magnify the significance!
>
>The basic reason is that the sample mean of the 60 will be the
>same as the sample mean of the 12, while the sample Standard
>Error of the mean will be 1/sqrt(5) times that of the 12.
>
>Hence the t-value for the 60 will be sqrt(5) = 2.236 times
>the t-value for the 12. So if, say, your t-value for the 12
>was 1.36343 (on 11 degrees of freedom) so that the 2-sided
>P-value was then 0.20 (rather disappointing ... ), then if
>you did the above you would get a t-value of 3.048722, and
>the t-test procedure (being unaware of your deviousness)
>would treat this as having 59 degrees of freedom, with the
>resulting P-value then being 0.0034 which is much more
>satisfying!
>
>Your question is not as "dumb" as it might at first seem.
>While it is clearly invalid to create a large dataset by
>chaining together replicates of a small one, until you get
>one large enough to give you an extreme P-value, this is
>not grossly different from going back to the population
>again and again, repeatedly sampling 12 each time until
>you again get the desired result.
>
>This is because, if the initial 12 were a fair sample,
>future samples of 12 are unlikely to be grossly dissimilar
>to the initial 12. So sooner or later (and with reference
>to the above example probably with around 5 repetitions)
>you could move from P=0.2 to P < 0.01 by repeated sampling.
>
>The aggregate sample at any stage is then a valid sample
>of that size from the population, as opposed to the invalid
>"sample" generated by recycling the original small one.
>
>What is invalid about the procedure is the intention to
>keep going until you get a small enough P-value. This
>will inevitably occur if you keep going long enough.
>
>No Null Hypothesis is ever exactly true in real life.
>If it is off by some small amount, then a large enough
>sample (and you may need a very large one) will almost
>surely result in a P-value smaller than your target.
>
>The real question is: How far off is it? Is this difference
>of any interest? This leads on to the question: If the
>smallest difference which is of practical interest is,
>say, D, then how large a sample would we need in order
>to have a good chance of a significsant P-value if the
>true difference were at least D?
>
>Also, the "How far off is it?" question can be addressed
>by looking at a confidence interval for the difference.
>Such broader approaches should always be used, rather
>than simplistic reliance on mere P-values.
>
>Hoping this helps!
>Ted.
>
>--------------------------------------------------------------------
>E-Mail: (Ted Harding) <[log in to unmask]>
>Fax-to-email: +44 (0)870 094 0861
>Date: 03-Aug-11 Time: 08:26:20
>------------------------------ XFMail ------------------------------
|