Hi everyone,

I am in the process of fitting a large number of binary logit models
in a data mining type of project and have a general question about
model comparison/selection:

I have many fitted models with a single explanatory variable as an
initial step for reducing the list of potential predictors. Some of
the models obtain low AIC and very large Wald Chi Sq test stat values
for the predictor, however, when I check the classification matrix,
these models do NOT differentiate between the 2 responses at all: They
predict all obs into 1 response category. Normally, I also look at hit
rates, but this of course seems invalid when all predictions are of
the same response category. Many of these models obtain lower AIC
values and higher ChiSQ values than the models that do differentiate
between the 2 responses.

Why does this happen and what are best practices in this situation?

Also, can anyone suggest a good web resource for reading more about
this issue and logistic model comparison/selection in general?



You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.