Print

Print


Dear Experts,

The following is a not so uncommon situation found in statistical
modeling, but to my knowledge, with no good solution.  What would you
do?

---Question---

Your dependent variable has high variability.  Let's say it is
win/lose (1/0).  At your disposal is a categorical variable with many
categories and each category has a small number of observations, let's
say 10.  Your total sample size is fairly large, 15,000.  If you know
from other information sources that the impact of different categories
is expected to have about a 4% impact on the win/lose outcome, how do
you go about assigning a value to each category?

Keep in mind that with only 10 observations per category level, the
minimum change in the dependent variable would be a change of 10
percentage points and the average change might be more like 30
percentage points, while the change due to the category is only
expected to be around 4%.

You know with certainty, that the categories did have an impact on the
outcome, and that that impact is hidden within the variability of the
response in your 15,000 observations.  You also feel, that the Gods of
chance are taunting you, and you might have even imagined one of them
snickering under his breath.  (Ha ha, just had to add a little humor)

----Part B----

If you are up for round 2, then what would you do if the observations
per category level is not the same.  ie One catogory might have 5
observations while another might have 47.

----

Best regards,

David Young
                            mailto:[log in to unmask]