The 10-15 cases per rarer event is often quoted as folklore. There is,
however, a paper on simulation studies by Peduzzi P, Concato J, Kemper E,
Holford TR, Feinstein AR (1996). A simulation study of the number of events
per variable in logistic regression analysis. Journal of Clinical
Epidemiology 49 1373-9.
As for the other techniques, were they in agreement with your LR results
each of the two times that you did this ? How did you proceed ? - LR first,
then the other two on only the variables identified on LR. Or all 3 methods
on all 50 variables ? Or screen by the other techniques and then feed into
LR ?
Did you use stepwise techniques in the LR ? (This not recommended). But the
tiny sample size is the main problem.
BW,
Martin Holt
----- Original Message -----
From: "bgbg bg" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, May 07, 2009 2:25 PM
Subject: Re: Fail to validate models. What's next?
Thank you for the fast responses.
However do note that LR was not the only tested technique. As I said,
SVM and bayesian classifier were also used and each selected
descriptor set had average MCC value of above 0.85. Doesn't this fact
make the overfitting less probable?
Also, can you give any reference for the 10-15 cases per descriptor?
Thank you
bgbg.bg
On Thu, May 7, 2009 at 4:00 PM, Martin Holt <[log in to unmask]> wrote:
> Hi,
>
> As far as logistic regression goes, you need 10-15 events of the rarer
> kind
> for each variable considered (not just settled on). I would advise not
> even
> starting in your situation. You could try searching "MedStats" for further
> info.
>
>
On Thu, May 7, 2009 at 3:58 PM, Philip McShane <[log in to unmask]> wrote:
> Dear bgbg
>
> Overfitting??/ Understatement!!!!
>
> There is an often cited guideline that for logistic regression you should
> use 10 cases per predictor. Using a genetic algorithm probably does not
> change that very much. However, you don't know how many polynomials,
> interaction terms and other things your model adds. An interaction term
> for instance constitutes another predictor.
>
> So you may have effectively more predictors than cases.
>
> You might find the logistic regression more informative about what is
> going on!
>
> Regards
>
> Phil McShane
|