I think it is a genuine misunderstanding about the nature of statistical
inference which I have come across many times in my career.
Clearly the intention would not have been to falsify data - but it is
important to realise that, in the context of Neyman Pearson inference,
that is what doing what is suggested would amount to.
Mathematically what doing that would do to the data can be of no interest,
largely because multiplying the data by an arbitrary number has an
entirely predictable, and utterly uninteresting, effect.
We might as well imagine there are 47 Rupert Murdochs - then what ? There
aren't, at least not that we observe !
klim
On 03/08/2011 09:30, "Dr. Amy Price" <[log in to unmask]> wrote:
>Dear Klim,
>
>It is possible you may have misunderstood the intent of the query. I don't
>think that was the reason the question was asked. The idea was
>mathematically what could this do to the data and this is important as Ted
>suggests because sample size and efforts to obtain the same are not as
>clear
>cut as it would seem. Real and random are not black and white and it is
>important for a student to understand how probability works. Making up
>data
>however creatively is unacceptable ethically and obviously any student
>that
>would take the time to write a list serve would not be asking a list on
>evidence based health for ways to falsify data.
>
>Best regards,
>
>Amy
>
>Amy Price PhD
>Http://empower2go.org
>Building Brain Potential
>
>
>
>-----Original Message-----
>From: Evidence based health (EBH)
>[mailto:[log in to unmask]] On Behalf Of Klim McPherson
>Sent: 03 August 2011 04:20 AM
>To: [log in to unmask]
>Subject: Re: Sample Size Question
>
>Please ! Doing what is suggested is simply making up data - lazily.
>
>That's what is wrong with it !
>
>The 'significance' is premised on real random samples - which would be
>violently violated.
>
>Klim
>
>
>
>Klim McPherson Phd FFPH FMedSci
>Visiting Professor of Public Health Epidemiology
>Nuffield Dept Obs & Gynae & New College
>University of Oxford
>Mobile 007711335993
>
>
>
>
>
>On 03/08/2011 08:26, "Ted Harding" <[log in to unmask]> wrote:
>
>>On 03-Aug-11 01:25:54, jo kirkpatrick wrote:
>>> Please forgive what might be a really dumb suggestion but
>>> could we magnify the significance of say a T-Test by feeding
>>> the same 12 results through 4 or 5 times? Please don't all
>>> scream at once, I am only an MSc student!
>>>
>>> Best wishes Jo
>>> [The rest of the inclusions snipped]
>>
>>Jo,
>>If by this you mean stringing a set of 12 results together with
>>itself (say) 5 times, and then feeding the resulting 60 data
>>values into a t-test, then the answer is that you will indeed
>>magnify the significance!
>>
>>The basic reason is that the sample mean of the 60 will be the
>>same as the sample mean of the 12, while the sample Standard
>>Error of the mean will be 1/sqrt(5) times that of the 12.
>>
>>Hence the t-value for the 60 will be sqrt(5) = 2.236 times
>>the t-value for the 12. So if, say, your t-value for the 12
>>was 1.36343 (on 11 degrees of freedom) so that the 2-sided
>>P-value was then 0.20 (rather disappointing ... ), then if
>>you did the above you would get a t-value of 3.048722, and
>>the t-test procedure (being unaware of your deviousness)
>>would treat this as having 59 degrees of freedom, with the
>>resulting P-value then being 0.0034 which is much more
>>satisfying!
>>
>>Your question is not as "dumb" as it might at first seem.
>>While it is clearly invalid to create a large dataset by
>>chaining together replicates of a small one, until you get
>>one large enough to give you an extreme P-value, this is
>>not grossly different from going back to the population
>>again and again, repeatedly sampling 12 each time until
>>you again get the desired result.
>>
>>This is because, if the initial 12 were a fair sample,
>>future samples of 12 are unlikely to be grossly dissimilar
>>to the initial 12. So sooner or later (and with reference
>>to the above example probably with around 5 repetitions)
>>you could move from P=0.2 to P < 0.01 by repeated sampling.
>>
>>The aggregate sample at any stage is then a valid sample
>>of that size from the population, as opposed to the invalid
>>"sample" generated by recycling the original small one.
>>
>>What is invalid about the procedure is the intention to
>>keep going until you get a small enough P-value. This
>>will inevitably occur if you keep going long enough.
>>
>>No Null Hypothesis is ever exactly true in real life.
>>If it is off by some small amount, then a large enough
>>sample (and you may need a very large one) will almost
>>surely result in a P-value smaller than your target.
>>
>>The real question is: How far off is it? Is this difference
>>of any interest? This leads on to the question: If the
>>smallest difference which is of practical interest is,
>>say, D, then how large a sample would we need in order
>>to have a good chance of a significsant P-value if the
>>true difference were at least D?
>>
>>Also, the "How far off is it?" question can be addressed
>>by looking at a confidence interval for the difference.
>>Such broader approaches should always be used, rather
>>than simplistic reliance on mere P-values.
>>
>>Hoping this helps!
>>Ted.
>>
>>--------------------------------------------------------------------
>>E-Mail: (Ted Harding) <[log in to unmask]>
>>Fax-to-email: +44 (0)870 094 0861
>>Date: 03-Aug-11 Time: 08:26:20
>>------------------------------ XFMail ------------------------------
>
|