Print

Print




Hi again,

Thank you Jeremy and Kate for your input. 

My supervisor has asked me to check effect size on my regression results because he is worried about significant results being due to my relatively large sample (N=500). Is there another way of checking my results that would do what he wants me to check with the effect size ? If I understood Jeremy correctly there is little point in doing it as it  just restates what the p-value says or is there more to it?

I am currently doing SEM but have only tackled the measurement model so far.

Thanks a million for your advice!

Best wishes




Date: Tue, 4 Feb 2014 09:07:58 -0800
From: [log in to unmask]
Subject: Re: Mediation Sample Size - Bootstrapping
To: [log in to unmask]


Summary (because this got quite long): Power analysis is weird, and doesn't really make sense. Don't worry about it too much.



Hi Kate
Power is weird.


What you're talking about is post hoc power analysis, which is (very) frowned upon in some circles.
The problem with post hoc power analysis is that you're just restating the p-value in a different way.  


If you got a significant result, you had enough power, if you didn't get a significant result, you didn't have enough power. You knew that before you did the power analysis.


All null hypotheses (in their two tailed forms) are false.  You know that there's a result there to be found. What you don't know, if you don't have a significant effect, is the direction of the result. That makes power analysis kind of worthless, because any time you don't get a significant result, it's because you don't have enough power.



Hi Iljana



Power is weird.
You're asking what you're going to find, but you haven't looked for it yet.


The best way to determine the effect size that you want is to determine what would be clinically (or substantively, or theoretically) significant. But that's really hard to say.


The second best way is to base it on prior research. But if we already knew the effect size, we wouldn't need to do our study.


The third way is to use a standard effect size - small, medium or large. This is better than guess, but only slightly better.  


Here's what Martin Bland (disclaimer: he's a friend of mine, so I'm more likely to agree with him) says: "We might reasonably expect researchers to have this knowledge [about the difference and variation], but it is surprising how often they do not. We are then reduced to saying that we could hope to detect a difference of some specified fraction of a standard deviation. Cohen1 has dignified this by the name “effect size,” but the name is often a cloak for ignorance."



The full paper is worth reading here: http://www.bmj.com/content/339/bmj.b3985


There are two (or three) competing philosophies in the world of statistics and p-values. The Fisher approach says that we use the p-value that we get and interpret it. So 0.1 is better evidence than 0.2, and 0.01 is better evidence than 0.1.


The Neyman Pearson approach says that we pick a p-value (say, 0.05) before we do the study, and we decide significance based on that. It's either less than 0.05 or it isn't. If it's 0.051 it's not sig, if it's 0.99 it's not sig. If it's 0.049 it's sig, if it's 0.000001 it's sig. You don't care about the p-value, beyond whether it's sig or not.


What we actually use is a bizarre bastard combination of these methods. (I wrote about this in a job application letter once, to see if I could write the word 'bastard' and still get an interview - I did, but I didn't get the job).  


So when you do power analysis you're only using the Neyman Pearson approach (Fisher didn't believe in power analysis, and didn't believe in type II errors either).  But when you report the p-value, you're using the Fisher approach. These people really didn't like each other, that's how seriously they took these differences that we ignore. (They would avoid each other at conferences, they would reject each other's papers, they would not be seen reading each other's papers.)



People get a little hung up on power analysis, but it's not as good as we think it is. Most of the time.


Sometimes it's really, really good though. I have a friend who is a doctor, and was planning a study. She said that they have a condition which is misdiagnosed 3% of the time (I forget the exact details). There is a new approach that can be used that will reduce that to 2% of the time.  They have about 10 patients per day with the condition. The power analysis shows she'll need thousands of patients, which would take 5 years to collect the data. There was no point trying to do that study in that way.   Power analysis was useful there. But for things like studies of mediation, we're just guessing a lot of the time. In addition, a typical conversation about power analysis goes: 


Researcher: "I think we'll have a medium effect, how many participants do I need?"

Jeremy: "Lots".Researcher: "On second thoughts it will be larger than that"


Power analysis is always depressing, and people then change the inputs. If you put bullshit in, you get bullshit out.  And a lot of the time the only thing we can find to put in is bullshit.


Jeremy










 








On 4 February 2014 07:59, Hammond, Kate <[log in to unmask]> wrote:




Once you have conducted your experiment, you use your result to calculate your actual effect size, and then see if you are underpowered - 0.8 being the standard accepted power (to avoid type 2 error).  If you are underpowered, then your result is unlikely to be