Hi Ashley,
Thanks for your response. This gives me some places to go.
I am using SAS: Do you know if PROC LOGISTIC can automatically produce
the improvement in deviance stat you are referrring to?
On Fri, Mar 16, 2012 at 9:17 AM, Buckner Ashley (SD)
<[log in to unmask]> wrote:
> Dan,
>
> Does it just mean that the cut-off probability for classifying a
> 'success' is wrong? SPSS, for example, defaults to 0.5. I tend to use
> area under the ROC (Receiver Operating Characteristic) curve as a metric
> rather than correct classification rate.
>
> NB The Wald stat can be misleading - better to use improvement in
> deviance by fitting a constant-only model, than adding in the
> explanatory variable.
>
> Ashley
>
> Ashley Buckner
> Senior Operational Research Analyst
> Skills Indicators and Modelling | Department for Business, Innovation &
> Skills | 2 St Pauls Place, Sheffield S1 2FJ
> Tel: 0114 207 5029
>
> E-mail: [log in to unmask]
>
> Visit the FESAD Modelling Team web-site at
> http://bisintranet/groups/bsg/whatwedo/skills/fesa/indunit/model/Pages/w
> elcome.aspx
>
>
>
>
> -----Original Message-----
> From: A UK-based worldwide e-mail broadcast system mailing list
> [mailto:[log in to unmask]] On Behalf Of Dan Abner
> Sent: 16 March 2012 13:11
> To: [log in to unmask]
> Subject: Binary Logistic Regression Model Selection XXXX
>
> Hi everyone,
>
> I am in the process of fitting a large number of binary logit models
> in a data mining type of project and have a general question about
> model comparison/selection:
>
> I have many fitted models with a single explanatory variable as an
> initial step for reducing the list of potential predictors. Some of
> the models obtain low AIC and very large Wald Chi Sq test stat values
> for the predictor, however, when I check the classification matrix,
> these models do NOT differentiate between the 2 responses at all: They
> predict all obs into 1 response category. Normally, I also look at hit
> rates, but this of course seems invalid when all predictions are of
> the same response category. Many of these models obtain lower AIC
> values and higher ChiSQ values than the models that do differentiate
> between the 2 responses.
>
> Why does this happen and what are best practices in this situation?
>
> Also, can anyone suggest a good web resource for reading more about
> this issue and logistic model comparison/selection in general?
>
> Thanks!
>
> Dan
>
> You may leave the list at any time by sending the command
>
> SIGNOFF allstat
>
> to [log in to unmask], leaving the subject line blank.
>
> The original of this email was scanned for viruses by the Government Secure Intranet virus scanning service supplied by Cable&Wireless Worldwide in partnership with MessageLabs. (CCTM Certificate Number 2009/09/0052.) On leaving the GSi this email was certified virus free.
> Communications via the GSi may be automatically logged, monitored and/or recorded for legal purposes.
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|