I am looking for references on the problem of overfitting in multiple regression. The more predictors I have, the higher the amount of variance I can explain even if each predictor doesn't explain much. I have read that adding a large number of variables can result in fictitiously high values of R2. "Adjusted R2," allows me to compare regressions with different numbers of variables but are there rules of thumb regarding the numbers of predictor variables one can have. I have read the formula n=>5(k+2) where n is the sample size and k is the number of predictors. I would be most grateful for any insights/refs or sugesstions. Thanks John Mallett Univerity of Ulster Northern Ireland