Thanks Jeremy - that explains why Julie Pallant says if you have a small
sample then use the adjusted R square - I had always wondered why that was!
Abigail
----- Original Message -----
From: "Jeremy Miles" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, March 29, 2007 3:39 PM
Subject: Re: Multiple regression and Adjusted R Sqaure
> Abigal's right, but I can expand a bit, if anyone cares. (Those who
> haven't set their email programs to automatically delete emails from
> me. :)
>
> In statistcs, we calculate values based on the samples. We know that
> these don't match the population values, but we like to think that
> they are close. And, importantly, we like to think that they are as
> likely to be too high as too low. The mean, for example, is as likely
> to be higher than the population mean, as it is to be lower than the
> population mean. And on average, lots of sample means equal the
> population mean.
>
> (The mean is an ordinary least squares (OLS) estimator, and OLS
> estimators are BLUE - they are the best linear unbiased estimator.
> The unbiased part means that they are as likely to be too high as too
> low).
>
> When we calculate the standard deviation, we divide by N-1, not N,
> because otherwise the SD estimate is biased.
>
> Say that the population correlation is actually zero, and we calculate
> the correlation in our sample. Sometimes (half the time) it will be
> above zero, and sometimes (half the time) the sample correlation will
> be below zero. But then what happens when we calculate R2?
>
> Well, R2 is r, squared. 0 squared is 0, so the true (population)
> value of R2 is zero. But (as you'll remember from school) squaring a
> negative number gives a negative number. So when we get correlations
> less than zero, we'll square them to be above zero. The average
> correlation will be zero, but the average R2 will ALWAYS be above
> zero. It's not an unbiased estimator, it's too high.
>
> Adjusted R2 pulls R2 down, to ensure that it's an unbiased estimator -
> it's average will be zero, when the population R2 is zero.
>
> The fewer people you have, the more variance there will be in the R2,
> and so the more likely it is to be biased high. So R2 is pulled down
> more when there are few people.
>
> The more predictor variables you have, the more 'chances' R2 has to
> get high (it only needs one of those correlations to be high). So,
> the more it is pulled down.
>
> However, I don't like R2, for a couple of reasons. First, it can go
> negative. But R2 is a proportion of variance, it's a square, and it
> CAN'T go negative. We know that if adj R2 is negative, it's an
> underestimate. Second, it's messing with what you produced, which we
> don't really do with any other statistics. Any effect is possibly
> wrong - in that it's too extreme - and could be adjusted, but we
> usually don't bother.
>
> (Unless we're Bayesians, and then we might adjust some other things.
> Andrew Gelman (who, incidentally, has written a very nice, but
> slightly advanced, book on regression) had a really nice example of
> this on his statistical modeling blog::
> http://www.stat.columbia.edu/~cook/movabletype/archives/2007/03/bayesian_sortin.html#more
>
> Jeremy
>
>
>
>
>
> On 29/03/07, Millings Abigal Ms (SWK) <[log in to unmask]> wrote:
>> I'm not a stats brain by any stretch of the imagination but I've used MR
>> a
>> few times. As I understand it, the r square value tells you about how
>> good
>> a predictor your model is, how much variance is accounted for, as you
>> said.
>> But you'd want to know how many predictor variables you put into that
>> regression to get your .167. If you put only 2 in, and they are both
>> coming
>> out as significant predictors, and together they account for 16.7% of the
>> variance, then that's reasonable - people report models like that in
>> articles. But if you put 8 variables in, and none of them were
>> independently
>> sig, then it's not meaning so much, because the more you chuck in the
>> more
>> likely you are to get them accounting for some variance somewhere! I had
>> a
>> sig model recently with 4 predictors (not strictly a regression but still
>> it
>> serves to illustrate my point) - 2 of the predictors were sig, but when I
>> did it as a regression, the r square was only 0.006! Thus although sig,
>> the
>> predictors weren't really telling me anything about my DV.
>> Hope this helps (and I hope I'm not telling you wrong - no doubt someone
>> will correct me!)
>>
>> ----- Original Message -----
>> From: "Davies, Nicola" <[log in to unmask]>
>> To: <[log in to unmask]>
>> Sent: Thursday, March 29, 2007 1:53 PM
>> Subject: Multiple regression and Adjusted R Sqaure
>>
>>
>> Hi All,
>>
>> In a multiple regression, say, I get an adjusted r square of .167. Does
>> that mean that the predictors only account for 17% of the variance, this
>> being low? Ideally, am I wanting a large adjusted r square, such as
>> .838?
>>
>> Also, is it possible to have a low adjusted r square whilst also having a
>> significant F value in the anova. What would such a situation tell the
>> researcher?
>>
>> Kind Regards,
>>
>> Nicola Davies,
>> BSc; MSc Comm.; PhD Candidate
>> Liasion Officer for the DHP Postgraduate Subcommittee
>>
>
>
> --
> Jeremy Miles
> Learning statistics blog: www.jeremymiles.co.uk/learningstats
>
|