Dear All, Thanks very much to those who replied to my query on what are considered acceptable (unacceptable) levels of 'missingness', either in analyses of complete data or (if appropriate) of imputed data (having done checks for biases, patterns of missingness etc, etc). No-one really came up with an answer to the question, perhaps not surprisingly, but one frequent and useful piece of advice was to include a sensitivity analysis. The replies - diverse in content (not to mention tone) - were all interesting and are listed below. Best wishes, Sam. Sam Pattenden EEU LSHTM 0171-927-2316. ========================================= From: [log in to unmask] (N T Longford) To: SAM PATTENDEN@LSHTM Subject: Re: Missing Outcome Data - Acceptable Levels? Date sent: 16-Mar-99 14:21:15 +0000 In various areas of epidemiology there are `acceptable' levels of non-response. But they do not refer to the validity of the analyses which ignore the issue of missingness. They merely reflect the percentage of nonrespondents that would be obtained in all went well in data collection. It is a gross misunderstanding to regard such a percentage as a licence to ignore missingness. Multiple imputation works, in principle, for whatever the proportion of nonresponders, because the uncertainty about the missing data is reflected in the imputations. When the proportion is very large, say, 25%+, the customary number of 5 imputations may not be sufficient, but 7 or 9 may do. Multiple imputation can be supplemented by sensitivity analysis, exploring how changes in the assumptions about the process of missingness impact on the conclusions obtained. If only 55% of the children have (complete) records, I would not take the complete data analysis seriously, and hope neither would anybody else. Nick Longford, [log in to unmask] ======== From: [log in to unmask] (Jane Hutton) To: SAM PATTENDEN@LSHTM (Sam) Subject: Re: Missing Outcome Data - Acceptable Levels? Date sent: 16-Mar-99 15:15:36 +0000 Dear Sam My general rules are: 1. be completely explicit about what you've done (by way of jettisoning) and what went wrong. 2. make a serious attempt to estimate the possible biases arising from missing data (or at least the bounds of the biases). On 2, for eg., if you have 55% response, and 40% of the responders say 'yes' to a question, you know that overall between .4 times .55 (=22%) and .4 times .55 plus .45 (=.67%) of the sample might answer 'yes'. So 40% could be anywhere from 22% to 67%, and that's before you get to the confidence intervals. best wishes Jane ======== From: [log in to unmask] (Annette Dobson) To: SAM PATTENDEN@LSHTM (Sam) Subject: Re: Missing Outcome Data - Acceptable Levels? Date sent: 16-Mar-99 21:48:16 +0000 Dear Sam As we progress with the Australian Longitudinal Study on Women's Health my views of what is 'OK' for missing data become more rigorous. As soon as you start doing any multivariate, including longitudinal, analyses case-wise deleting starts cutting sample size down enormously, even if data were truly missing at random and no bias is being introduced. My goal is therefore 'none'. Annette ========== From: [log in to unmask] (David X Briggs) To: SAM PATTENDEN@LSHTM Subject: Re: Missing Outcome Data - Acceptable Levels? Date sent: 17-Mar-99 9:43:36 +0000 Dear Sam, So far as I am concerned, it is not possible to give an 'acceptable' level of missing data, based on these facts alone. However there are some points which should be considered when analysing your data. a) I think the crucial consideration, that you already seem to be investigating, is 'What is the nature of the technical problem?'. This has caused you to discard some of your outcomes, resulting in a smaller sample size, but it might also have more serious implications. Suppose for some reason the discarded outcomes would have all been higher responses than the observed subjects - this might happen if your lung function machine tended to give spurious results for people with higher lung function measurements. Ignoring this potential selection bias would introduce an obvious estimation bias into your analysis. b) Imputation methods do not get around the possible bias introduced in (a). Imputation methods impute values based on the observed data - if the distribution of the discarded data is not the same as that of the observed data, then the imputed values will only represent samples from the observed data, and do not therefore get around the bias problem. c) Multiple imputation does not get around the bias poblem, but it does at least maintain your sample size which helps to maintain the precision of your estimates. d) I would start by doing a worse and a best case scenario. This involves imputing the lowest and highest possible values for the missing cases. You can then determine the sensitivity of your analyses to these extremes. If your analyses are robust to these extremes, then you have little to worry about, if your analyses are sensitive, then you need to think again. e) On a more positive note, if you can consider that your technical problem does not introduce a selection bias, then multiple imputation or some other suitable method (Bayesian imputation, EM algorithm) would be fine. f) A good reference on this is by Little and Rubin, and is called (I think) Statistical Inference with Missing Data. Hope this is of some use. Best Wishes, David Briggs. =========== From: [log in to unmask] (Andrew McCulloch) Send reply to: [log in to unmask] To: SAM PATTENDEN@LSHTM (Sam) Subject: Re: Missing Outcome Data - Acceptable Levels? Date sent: 17-Mar-99 14:28:16 +0000 Sam , For me the most important issue to form a view on is whether the missing outcomes are missing at random or whether missingness is informative - related to child health. If it is informative then you will have problems. Hope you are well. Yours sincerely Andrew McCulloch ============= From: [log in to unmask] (Rob Nichols) To: SAM PATTENDEN@LSHTM ('Sam') Subject: RE: Missing Outcome Data - Acceptable Levels? Date sent: 18-Mar-99 16:12:11 +0000 How about using a Bayesian framework? Rob ==================================== ==================================== %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%