I am looking for references on the problem of overfitting in multiple
regression. The more predictors I have, the higher the amount of variance
I can explain even if each predictor doesn't explain much. I have read
that adding a large number of variables can result in fictitiously high
values of R2. "Adjusted R2," allows me to compare regressions with
different numbers of variables but are there rules of thumb regarding the
numbers of predictor variables one can have. I have read the formula
n=>5(k+2) where n is the sample size and k is the number of predictors. I
would be most grateful for any insights/refs or sugesstions.
Thanks
John Mallett
Univerity of Ulster
Northern Ireland
|