Dear Allstaters,
For those interested in the replies I received, they are attached below.
Unfortunately the general public does not have free access to the
Government docs. Thanks for all who replied.
Nonetheless I want to add also my own comments.
To answer the question 'whether it is ALWAYS preferable to adjust than not
adjust', I think it is not difficult to prove that the answer is a 'no'.
I've heard it said that when we adjust we are assuming MAR but if we don't
adjust we are assuming MCAR, and since MAR is a less restrictive condition
than MCAR, it is always better to adjust. But I think the argument is
flawed because MCAR is not a special case of MAR. Having MCAR does not
imply MAR (unless the model is a plain one). And so crucially we must ask
whether the covariates used in the MAR model are 'good' ones. And my
opinion is that blindly choosing covariates based on whether they are
significant in predicting missingness may not be a good way of selecting
the useful covariates.
Consider the following model:
In Country A, Y=2x + error
In Country B, Y=2x + 1 + error
X~Uni[0,1), error~Uni[0,1)
and let's say that X and Y are always missing in pairs, via the following
mechanism:
logit(P(missing))= -5 + 2Y
Missing is clearly non-random. If we think that adjusting for missingness
is always better than not adjusting, we may find that 'Country' is a good
predictor of missingness, because Y is generally larger in Country B, and
hence there are also more missing in Country B. If we therefore use a
weighted regression to estimate the relationship between Y and X,
controlling for country, we'll find that on average this estimate is more
biased than if we don't adjust, because we'll be giving more weight to
observations in Country B which are also more biased.
I've used the above parameters because I thought they were reasonable but
can demonstrate the effect in an Excel spreadsheet. Nonetheless I confess
the differences between the estimates were minimal. So I would welcome if
somebody can come up with a model that illustrates a bigger difference.
But if we can't do this, I think it may still be fair to say that some
adjustment is USUALLY better than no adjustment.
Tim
Original query:
Is anyone aware of articles discussing where and when it is appropriate to
use probability weights in adjusting for bias due to missing values? I
have seen this technique mentioned as a rough and ready way to deal with
missing data, but am a bit hesitant in applying to my data. My question
is: is it always preferable to make an adjustment?
I think of it this way: Say X is a covariate or a set of covariates that
predicts strongly whether the dependent variable Y is missing or not. For
argument's sake, let's say if X = 1, Prob(Y=missing)=.1, and if X=2,
Prob(Y=missing)=.3. The logic of using probability weights seems to be we
want to put more weights to observations where X=2, so as to compensate
the loss of observations due to the higher probability of Y missing. I can
understand this in the stratified sampling context. But in the context of
missing data, I'm not sure the logic works. If Prob(Y=missing) is higher
when X=2, then it is probable that the estimates obtained when X=2 are
also more biased. Given this assumption we should give less weights to
observations when X=2, rather than more, as the common use of probability
weight would suggest. In another scenario, we can imagine two research
papers one with a response rate of .9, and the other with a response rate
of .7. Surely if we were to integrate the result we should put more
emphasis on the former.
Is my logic ok here?
####################################
Have you looked at Rosenbaum (1987), JASA, 82, 398, pp. 387-394?
######################################
You might find the sections on non-response in our instructional web site
helpful
http://www.napier.ac.uk/depts/fhls/peas/
In particular the theory sections on weighting and non-response
##################################
You might be interested in
http://www.statistics.gov.uk/methods_quality/gss_method_conf_2002/downloads/
wagstaff.ppt
also
NSMS03: Report of the task force on imputation, GSS Methodology Series
available for free at
http://www.statistics.gov.uk/statbase/Product.asp?vlnk=2182
Also a website
www.missingdata.org.uk
"Getting started, software, example analyses,
frequently asked questions, discussion board,
preprints, bibliography...."
should provide a source/contact for expertise on methods.
2. On your specific question, whether
" If Prob(Y=missing) is higher
when X=2, then it is probable that the estimates obtained when X=2 are
also more biased"
depends on the source of the additional non-response, and on the source of
bias. In a survey context, non-response may happen for reasons associated
with bias [eg if subject topic is sensitive for particular groups] or
equally for reasons not associated with bias [eg contact availability of
individuals at different times of day, or eg that fewer elderly poor
people
have telephones, for a telephone survey, so sampling frames based on
telephone numbers will under-represent the elderly poor compared with the
general household population].
3. So whether/how you re-weight depends in part on whether you know that
your achieved sample is different from the population in important ways
likely to affect your overall estimate, and what you know about the
reasons
for non-response and how/if this may be associated with possible biases.
Sorry but there's not a single simple answer, it will depend on the nature
of your particular objective, and the data collection instrument chosen,
and
evidence from data and from other sources [administrative sources, other
research & surveys etc].
4. Last but not least, then you can consider trying to 'simulate' the
effect
of such lower response by randomly 'deleting' data [perhaps from a group
that is less well-represented in your data than it 'should be' based on
population data], and then compare the effect of weighting vs not
weighting
on your estimates, and compare with the 'true' value computed without data
'deleted' [nb repeat a large number of times and looking at the averages
of
squared differences]
####################################################################
I follow your logic but for there to be bias WITHIN strata of X
then dropout must be related to factors other than X.
If you can totally account for dropout using a set of factors X
then dropout within each stratum should be completely random
and hence bias-free.
I have dabbled with inverse probability weighting but have
found that what lets you down is:
a) The inability to predict missingness (or account for missingness) very
well
b) The fact that your predictors of missingness will undoubtedly also
suffer from missingness.
My advice would be to take your model + your set of factors X
and put them through Patrick Royston's MICE routine in Stata.
Multiple Imputation is where it's at at the minute
|