Dear Allsat's
I have been building a regression model on a large dataset of 700K.
Initially I built the model on a 50/50 distribution of 1's and 0's and
on the gains chart at a level of 50% I am getting 77.5% of the
responders. However, when building a model on the same variables with a
distribution on 28% 1's and 72% 0's, which is the actual distribution of
the base, at 50% I am getting 86% of responders.
Why would the difference in 77.5% and 86% be so high?
Should the graph look so differently when not using a 50/50 sample?
All comments are appreciated. Thanks in advance.
Lucy
This e-mail and any attachments contains confidential information and is intended solely for the individual to whom it is addressed. If this e-mail has been misdirected, please notify the author as soon as possible. If you are not the intended recipient you must not disclose, distribute, copy, print or rely on any of the information contained, and all copies must be deleted immediately.
Whilst we take reasonable steps to try to identify any software viruses, any attachments to this e-mail may nevertheless contain viruses which our anti-virus software has failed to identify. You should therefore carry out your own anti-virus checks before opening any documents. Capital Communications Group and its subsidiary companies will not accept any liability for damage caused by computer viruses emanating from any attachment or other document supplied with this e-mail.
No representative or employee of Capital Communications Group and its subsidiary companies has the authority to enter into any contract by e-mail.
|