Official Statistics and Statistical Computing Sections
Royal Statistical Society
Wednesday 18 May, 2.00 - 5.30pm at the RSS (tea served at 3.30pm)
TRICKS OF THE TRADE:
WHAT TO DO WITH MISSING DATA.
2pm Introduction
2.10pm
Dick Wiggins (City University, London) and Gopal Netuveli (Imperial
College, London)
Missing data a pervasive fact of life: a user’s perspective
This introduction provides a user’s perspective to some of the basic
terminology used to describe the process of missingness and the critical
application of various remedies to handle missing data in the context of
the British Household Panel Survey using STATA. The strategies include the
cost of ignoring the problem, ad hoc methods, hot decking, an introduction
to multiple imputation and Heckman’s approach.
2.30pm
Nick Longford (SNTL, Leicester)
The role and scope of multiple imputation for incomplete data.
Multiple imputation (MI) was originally designed for the setting with a
number of secondary analysts who wish to apply complete-data methods on
publicly available databases without requiring any expertise in methods
for handling missing values, yet they and the distributor (data
constructor) have a stake in good statistical practice (near-efficient
estimation). In principle, MI can be applied to any problem that can be
formulated as involving missinginformation - its scope is wider than of
the EM algorithm, which is constrained by our limited ability or capacity
to iteratively execute the E-step (estimation of the complete-data
sufficient statistics). Direct modelling of the data-generating
(sampling) and nonresponse processes is the gold standard, but its
application is practical only in one-off analyses, and the expertise in it
is difficult to permeate through the community of analysts. The strengths
and weaknesses of MI will be overviewed, with references to published
examples. The 'difficult' parts of the MI method (modelling of the
nonresponse process) will be discussed from the 'missionary' perspective -
- how good statistical practice can be promoted without the stringent
requirements of expertise in statistical theory or any specialised
training. NMAR mechanisms will be discussed in connection with
sensitivity analysis.
3pm
Rob Woods (SPSS)
Practical solutions for dealing with missing data
Missing data is a common issue that should be considered when undertaking
any applied work. There are a number of ways to deal with missing data.
When starting out on this process, it is always imperative that the
reasons for missing data are understood. This presentation outlines some
practical tips to overcome missing data. This presentation also outlines
how some techniques are better at dealing with missing data than others,
along with other general approaches to imputing missing data.
3.30pm Tea
4pm
James Carpenter & Mike Kenward (LSHTM)
Multiple Imputation for Multilevel Data
Multiple imputation has proved to be an invaluable tool for handling
incomplete data, especially with large messy datasets with many incomplete
explanatory variables. A central role is played by the imputation
distribution which requires modelling multivariate distributions made up
of potentially diverse types of variable (e.g. continuous, ordinal and
nominal discrete.) An important additional complication is the presence
of multilevel structure among these variables, in particular hierarchical
and
longitudinal. In this talk we discuss some of the approaches used for
modelling such multivariate data, and for drawing appropriate imputations.
We focus in particular on a newly developed macro in MLwiN for multilevel
multiple imputation.
4.45pm
Fiona O'Callaghan (Statistical Solutions Ltd)
SOLAS 3.2 for Missing Data Analysis
SOLAS 3.2 for Missing Data Analysis is a windows based software tool for
data imputation and missing data exploratory analysis that provides a
choice of both Multiple Imputation and Single Imputation methods.
The Single Imputation methods available in SOLAS include; Hot Decking,
Regression Imputation, Group Means, and Last Value Carried Forward.
Multiple Imputation was originally proposed by Rubin in the early 1970’s
as a possible solution to the problem of survey nonresponse, to address
the failings of standard analyses of incomplete datasets. The idea behind
Multiple Imputation is that for each missing value in a dataset, we impute
several values (M) instead of just one, to represent the uncertainty about
which values to impute.
In SOLAS, users have two Multiple Imputation approaches to choose from,
namely: a predictive model-based approach, where the predictive
information contained in a user-specified set of covariates is used to
predict the missing values, or a propensity score-based approach, in which
cases are grouped according to their probability of being missing (i.e.
propensity score) and then an approximate Bayesian bootstrap is applied to
sample observed values to impute the missing values.
This demonstration will include examples of how SOLAS can be used to
perform multiple imputation on datasets containing both continuous and
categorical data.
For more information please contact Phil Bowtell ([log in to unmask]) or
Catherine Heffernan ([log in to unmask])
|