Hi,
I'm working on a classical classification problem for a marketing
application (predicting response to a mailing campaign). The problem is not
only that the response rate is low (2.26%) which is not rare to find in this
type of applications, but the sample size is small (only 5,653 instances –
including both responses (128) and non-responses (5,525)). The number of
predictors in my data set is ~ 50.
If possible, I'd like to have your opinion regarding how to approach this
problem. In particular, I believe this data set is small to split it into
train and validation. So I considered, performing under-sampling to obtain a
10% response and used Logistic Regression with a Stepwise selection and
cross-validation error as the selection criterion. Is there any alternative
approach you would think may work better?
Many thanks in advance for your help.
Kind Regards,
Lars.
|