Thank you for the fast responses.
However do note that LR was not the only tested technique. As I said,
SVM and bayesian classifier were also used and each selected
descriptor set had average MCC value of above 0.85. Doesn't this fact
make the overfitting less probable?
Also, can you give any reference for the 10-15 cases per descriptor?
Thank you
bgbg.bg
On Thu, May 7, 2009 at 4:00 PM, Martin Holt <[log in to unmask]> wrote:
> Hi,
>
> As far as logistic regression goes, you need 10-15 events of the rarer kind
> for each variable considered (not just settled on). I would advise not even
> starting in your situation. You could try searching "MedStats" for further
> info.
>
>
On Thu, May 7, 2009 at 3:58 PM, Philip McShane <[log in to unmask]> wrote:
> Dear bgbg
>
> Overfitting??/ Understatement!!!!
>
> There is an often cited guideline that for logistic regression you should use 10 cases per predictor. Using a genetic algorithm probably does not change that very much. However, you don't know how many polynomials, interaction terms and other things your model adds. An interaction term for instance constitutes another predictor.
>
> So you may have effectively more predictors than cases.
>
> You might find the logistic regression more informative about what is going on!
>
> Regards
>
> Phil McShane
|