Hi,
Is anyone aware of articles discussing where and when it is appropriate to
use probability weights in adjusting for bias due to missing values? I
have seen this technique mentioned as a rough and ready way to deal with
missing data, but am a bit hesitant in applying to my data. My question
is: is it always preferable to make an adjustment?
I think of it this way: Say X is a covariate or a set of covariates that
predicts strongly whether the dependent variable Y is missing or not. For
argument's sake, let's say if X = 1, Prob(Y=missing)=.1, and if X=2,
Prob(Y=missing)=.3. The logic of using probability weights seems to be we
want to put more weights to observations where X=2, so as to compensate
the loss of observations due to the higher probability of Y missing. I can
understand this in the stratified sampling context. But in the context of
missing data, I'm not sure the logic works. If Prob(Y=missing) is higher
when X=2, then it is probable that the estimates obtained when X=2 are
also more biased. Given this assumption we should give less weights to
observations when X=2, rather than more, as the common use of probability
weight would suggest. In another scenario, we can imagine two research
papers one with a response rate of .9, and the other with a response rate
of .7. Surely if we were to integrate the result we should put more
emphasis on the former.
Is my logic ok here?
Tim
|