Hi everyone,
I'd like to ask your opinion on something. I was in discussion with a fellow statistician the other day and they had some ideas about coding categorical (independent) variables. Have you any views?
Say a variable had 2 categories, cat1 and cat2. In the discussion, it was mentioned that we shouldn't take a category, which is small relative to another category, as a reference category (they mentioned, "small" being < 100 with the other category being much larger) - when this occurs we should reverse the coding e.g....
Say we initially had 400 cases in our data, it was said that if we originally had less than 100 (say 80) in cat 1 (reference category, coded 0) and many more (say 320) in cat2 (coded 1), then we would reverse this so that cat1=1 and cat2=0 (reference category).
However, if the difference wasn't too small say we were dealing with 200 cases in total and e.g. say cat1 (reference, coded 0) had 90 cases and cat2 (coded 1) had 110 then we can just leave the coding as it stands.
As we know, reversing the coding only reverses the sign of the coefficient in the model....but it was said that when we "have low numbers in the reference category - relative to the other category(ies) - then this can affect the iterations used when calculating model coefficients". Any views?
It makes sense that the reference category is one which is reasonably commonly occurring...however, texts such as 'regression analysis' by Lewis-Beck (2003) say that essentially the reference category is arbitrary...and, obviously, the reference category should be chosen so that it eases interpretation...so I'd like your views on the 'iteration' problem that was discussed.
Thanks again,
Kim
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|