Dear All,

Thanks very much to those who replied to my query on what are 
considered acceptable (unacceptable) levels of  'missingness', 
either in analyses of complete data or (if appropriate) of imputed 
data (having done checks for biases, patterns of missingness etc, 

No-one really came up with an answer to the question, perhaps not 
surprisingly, but one frequent and useful piece of advice was to 
include a sensitivity analysis.  The replies - diverse in content 
(not to mention tone) - were all interesting and are listed below.

Best wishes,


Sam Pattenden

From:             [log in to unmask] (N T Longford)
To:               SAM PATTENDEN@LSHTM
Subject:          Re: Missing Outcome Data - Acceptable Levels?
Date sent:        16-Mar-99 14:21:15 +0000

In various areas of epidemiology there are `acceptable' levels of
non-response.  But they do not refer to the validity of the analyses
which ignore the issue of missingness.  They merely reflect the
percentage of nonrespondents that would be obtained in all went well
in data collection.  It is a gross misunderstanding to regard such a
percentage as a licence to ignore missingness. 

Multiple imputation works, in principle, for whatever the proportion
of nonresponders, because the uncertainty about the missing data is
reflected in the imputations.  When the proportion is very large, say,
25%+, the customary number of 5 imputations may not be sufficient, but
7 or 9 may do.  Multiple imputation can be supplemented by sensitivity
analysis, exploring how changes in the assumptions about the process
of missingness impact on the conclusions obtained. 

If only 55% of the children have (complete) records, I would not take
the complete data analysis seriously, and hope neither would anybody

        Nick Longford, 
        [log in to unmask]

From:             [log in to unmask] (Jane Hutton)
To:               SAM PATTENDEN@LSHTM (Sam)
Subject:          Re: Missing Outcome Data - Acceptable Levels?
Date sent:        16-Mar-99 15:15:36 +0000

Dear Sam

My general rules are:
1. be completely explicit about what you've done (by way of 
jettisoning) and what went wrong. 2. make a serious attempt to
estimate the possible biases arising from missing data (or at least
the bounds of the biases).

On 2, for eg., if you have 55% response, and 40% of the responders say
'yes' to a question, you know that overall between .4 times .55 (=22%)
and .4 times .55 plus .45 (=.67%) of the sample might answer 'yes'. So
40% could be anywhere from 22% to 67%, and that's before you  get to
the confidence intervals.

best wishes

From:             [log in to unmask] (Annette Dobson)
To:               SAM PATTENDEN@LSHTM (Sam)
Subject:          Re: Missing Outcome Data - Acceptable Levels?
Date sent:        16-Mar-99 21:48:16 +0000

Dear Sam
 As we progress with the Australian Longitudinal Study on Women's Health my
views of what is 'OK' for missing data become more rigorous. As soon as you
start doing any multivariate, including longitudinal, analyses case-wise
deleting starts cutting sample size down enormously, even if data were
truly missing at random and no bias is being introduced. My goal is
therefore 'none'.

From:             [log in to unmask] (David X Briggs)
To:               SAM PATTENDEN@LSHTM
Subject:          Re: Missing Outcome Data - Acceptable Levels?
Date sent:        17-Mar-99 9:43:36 +0000

          Dear Sam,

          So far as I am concerned, it is not possible to give an 
          'acceptable' level of missing data, based on these facts 
          alone. However there are some points which should be 
          considered when analysing your data.

          a) I think the crucial consideration, that you already seem 
          to be investigating, is 'What is the nature of the technical 
          problem?'. This has caused you to discard some of your 
          outcomes, resulting in a smaller sample size, but it might 
          also have more serious implications. Suppose for some reason 
          the discarded outcomes would have all been higher responses 
          than the observed subjects - this might happen if your lung 
          function machine tended to give spurious results for people 
          with higher lung function measurements. Ignoring this 
          potential selection bias would introduce an obvious 
          estimation bias into your analysis.

          b) Imputation methods do not get around the possible bias introduced 
          in (a). Imputation methods impute values based on the observed data - 
          if the distribution of the discarded data is not the same as that of 
          the observed data, then the imputed values will only represent samples 

          from the observed data, and do not therefore get around the bias 

          c) Multiple imputation does not get around the bias poblem, but it 
          does at least maintain your sample size which helps to maintain the 
          precision of your estimates.

          d) I would start by doing a worse and a best case scenario. This 
          involves imputing the lowest and highest possible values for the 
          missing cases. You can then determine the sensitivity of your analyses 

          to these extremes. If your analyses are robust to these extremes, then 

          you have little to worry about, if your analyses are sensitive, then 
          you need to think again.

          e) On a more positive note, if you can consider that your technical 
          problem does not introduce a selection bias, then multiple imputation 
          or some other suitable method (Bayesian imputation, EM algorithm) 
          would be fine.

          f) A good reference on this is by Little and Rubin, and is called (I 
          think) Statistical Inference with Missing Data.

          Hope this is of some use.

          Best Wishes,
          David Briggs.

From:             [log in to unmask] (Andrew McCulloch)
Send reply to:    [log in to unmask]
To:               SAM PATTENDEN@LSHTM (Sam)
Subject:          Re: Missing Outcome Data - Acceptable Levels?
Date sent:        17-Mar-99 14:28:16 +0000

Sam ,

 For me the most important issue to form a view on 
is whether the missing outcomes are missing at random or 
whether missingness is informative - related to child 
health. If it is informative then you will have problems.

Hope you are well.

Yours sincerely
Andrew McCulloch


From:             [log in to unmask] (Rob Nichols)
To:               SAM PATTENDEN@LSHTM ('Sam')
Subject:          RE: Missing Outcome Data - Acceptable Levels?
Date sent:        18-Mar-99 16:12:11 +0000

How about using a Bayesian framework?