Dear All,
Thanks very much to those who replied to my query on what are
considered acceptable (unacceptable) levels of 'missingness',
either in analyses of complete data or (if appropriate) of imputed
data (having done checks for biases, patterns of missingness etc,
etc).
No-one really came up with an answer to the question, perhaps not
surprisingly, but one frequent and useful piece of advice was to
include a sensitivity analysis. The replies - diverse in content
(not to mention tone) - were all interesting and are listed below.
Best wishes,
Sam.
Sam Pattenden
EEU
LSHTM
0171-927-2316.
=========================================
From: [log in to unmask] (N T Longford)
To: SAM PATTENDEN@LSHTM
Subject: Re: Missing Outcome Data - Acceptable Levels?
Date sent: 16-Mar-99 14:21:15 +0000
In various areas of epidemiology there are `acceptable' levels of
non-response. But they do not refer to the validity of the analyses
which ignore the issue of missingness. They merely reflect the
percentage of nonrespondents that would be obtained in all went well
in data collection. It is a gross misunderstanding to regard such a
percentage as a licence to ignore missingness.
Multiple imputation works, in principle, for whatever the proportion
of nonresponders, because the uncertainty about the missing data is
reflected in the imputations. When the proportion is very large, say,
25%+, the customary number of 5 imputations may not be sufficient, but
7 or 9 may do. Multiple imputation can be supplemented by sensitivity
analysis, exploring how changes in the assumptions about the process
of missingness impact on the conclusions obtained.
If only 55% of the children have (complete) records, I would not take
the complete data analysis seriously, and hope neither would anybody
else.
Nick Longford,
[log in to unmask]
========
From: [log in to unmask] (Jane Hutton)
To: SAM PATTENDEN@LSHTM (Sam)
Subject: Re: Missing Outcome Data - Acceptable Levels?
Date sent: 16-Mar-99 15:15:36 +0000
Dear Sam
My general rules are:
1. be completely explicit about what you've done (by way of
jettisoning) and what went wrong. 2. make a serious attempt to
estimate the possible biases arising from missing data (or at least
the bounds of the biases).
On 2, for eg., if you have 55% response, and 40% of the responders say
'yes' to a question, you know that overall between .4 times .55 (=22%)
and .4 times .55 plus .45 (=.67%) of the sample might answer 'yes'. So
40% could be anywhere from 22% to 67%, and that's before you get to
the confidence intervals.
best wishes
Jane
========
From: [log in to unmask] (Annette Dobson)
To: SAM PATTENDEN@LSHTM (Sam)
Subject: Re: Missing Outcome Data - Acceptable Levels?
Date sent: 16-Mar-99 21:48:16 +0000
Dear Sam
As we progress with the Australian Longitudinal Study on Women's Health my
views of what is 'OK' for missing data become more rigorous. As soon as you
start doing any multivariate, including longitudinal, analyses case-wise
deleting starts cutting sample size down enormously, even if data were
truly missing at random and no bias is being introduced. My goal is
therefore 'none'.
Annette
==========
From: [log in to unmask] (David X Briggs)
To: SAM PATTENDEN@LSHTM
Subject: Re: Missing Outcome Data - Acceptable Levels?
Date sent: 17-Mar-99 9:43:36 +0000
Dear Sam,
So far as I am concerned, it is not possible to give an
'acceptable' level of missing data, based on these facts
alone. However there are some points which should be
considered when analysing your data.
a) I think the crucial consideration, that you already seem
to be investigating, is 'What is the nature of the technical
problem?'. This has caused you to discard some of your
outcomes, resulting in a smaller sample size, but it might
also have more serious implications. Suppose for some reason
the discarded outcomes would have all been higher responses
than the observed subjects - this might happen if your lung
function machine tended to give spurious results for people
with higher lung function measurements. Ignoring this
potential selection bias would introduce an obvious
estimation bias into your analysis.
b) Imputation methods do not get around the possible bias introduced
in (a). Imputation methods impute values based on the observed data -
if the distribution of the discarded data is not the same as that of
the observed data, then the imputed values will only represent samples
from the observed data, and do not therefore get around the bias
problem.
c) Multiple imputation does not get around the bias poblem, but it
does at least maintain your sample size which helps to maintain the
precision of your estimates.
d) I would start by doing a worse and a best case scenario. This
involves imputing the lowest and highest possible values for the
missing cases. You can then determine the sensitivity of your analyses
to these extremes. If your analyses are robust to these extremes, then
you have little to worry about, if your analyses are sensitive, then
you need to think again.
e) On a more positive note, if you can consider that your technical
problem does not introduce a selection bias, then multiple imputation
or some other suitable method (Bayesian imputation, EM algorithm)
would be fine.
f) A good reference on this is by Little and Rubin, and is called (I
think) Statistical Inference with Missing Data.
Hope this is of some use.
Best Wishes,
David Briggs.
===========
From: [log in to unmask] (Andrew McCulloch)
Send reply to: [log in to unmask]
To: SAM PATTENDEN@LSHTM (Sam)
Subject: Re: Missing Outcome Data - Acceptable Levels?
Date sent: 17-Mar-99 14:28:16 +0000
Sam ,
For me the most important issue to form a view on
is whether the missing outcomes are missing at random or
whether missingness is informative - related to child
health. If it is informative then you will have problems.
Hope you are well.
Yours sincerely
Andrew McCulloch
=============
From: [log in to unmask] (Rob Nichols)
To: SAM PATTENDEN@LSHTM ('Sam')
Subject: RE: Missing Outcome Data - Acceptable Levels?
Date sent: 18-Mar-99 16:12:11 +0000
How about using a Bayesian framework?
Rob
====================================
====================================
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|