One-day workshop: Population size estimation for difficult to reach
populations in demography, official statistics, epidemiology and public
health
Tutors: Prof. Dr. Peter G.M. van der Heijden (Utrecht University and
Lancaster University) and
Dr. Maarten Cruyff (Utrecht University)
Wednesday 21 April 2010, 10-5pm
Venue: Postgraduate Statistics Centre, Lancaster University
To book a place please go to: http://shortcourses.maths.lancs.ac.uk/
=================================================================
In public policy there are many areas where the size of a population is
of interest but unknown. One can think of populations like the homeless,
illegal immigrants, drug addicts, prostitutes, victims of domestic
violence and babies with spina bifida as difficult to reach populations.
Usually one or more registrations exist that aim to list all the members
of the populations of interest, yet these registrations are imperfect
and miss part of the population of interest. Thus the question of
interest is how many members of the population are missed by the
registration(s), and what are their background variables. The master
class gives an overview of statistical models that answer this question.
The course will be largely non-technical, and references to the
technical literature will be provided. Many examples will be provided.
THE INSTRUCTORS
Peter van der Heijden and Maarten Cruyff have published several papers
on the development of statistical models for models for population size
estimation in journals such as Statistics in Medicine, Biometrics,
Biometrical Journal, Statistical Modelling and the Annals of Applied
Statistics. On a regular basis that carry out contract research for the
Dutch government on topics such as the estimation of the population size
of homeless, illegal immigrants, drug addicts, victims and offenders of
domestic violence. Currently they have a contract with Statistics
Netherlands to prepare the (virtual) census in 2012, where they will
focus on estimation of the population size of marginalised groups.
FURTHER DETAILS
There are two main statistical modelling approaches in this research
area. The first approach works with two or more registrations where it
is possible to link the members of the registrations. When there are two
registrations, say registration A and registration B, a two-way table
can be formed with counts describing how many individuals are see by
registration A but not by B, how many are seen by B but not by A, and
how many are seen by both A as well as B. The question of interest is to
estimate the number of individuals that are missed both by A as well as
by B. This number is estimated using a number of assumptions, where the
most prominent assumption is that the inclusion probability to be in
registration A is independent of the inclusion probability to be in
registration B.
In most applications in demography and related areas this independence
assumption is unrealistic. There are two ways to make models more
realistic. The first is by including covariates in the model that are
related to the inclusion probability models. Thus the independence
assumption is replaced by a less stringent assumption of conditional
independence. A second way to make the models more realistic is by using
more than two registrations. Then (loglinear) models can be used that
allow for interactions between pairs of registrations.
Recent developments will be discussed where (i) the covariates can be
used that are not available for every registration, (ii) where
registrations relate to different but overlapping populations (for
example, registration A refers to England and Wales whereas registration
B refers to England and Scotland, and interest is in the population size
for the UK) (iii) latent variables are used that are related to the
inclusion probabilities.
The second statistical modelling approach works with one registration
from which counts are derived. For example, when using a central police
registration with reports for domestic violence, reports can be
collected over victims such that there is a number of victims with one
report, a number with two reports, and so one, and the question is what
number of victims have zero reports. Once this latter number is known,
the population size is known by adding this latter number to the number
of victims known from the registration.
The statistical problem is to estimate the number of victims with a
count of zero. A central assumption is that the number of times every
individual in the population of interest is seen, follows a Poisson
distribution. A Poisson model can be estimated for the individuals in
the registration. This model uses truncated data as zeros do not appear
in the data. Models can be made more realistic by including covariates
that are related to the Poisson parameters of individuals. Other
extensions include allowing for unobserved heterogeneity of the Poisson
parameters.
Time schedule:
9.30 - 10.00 Registration
10.00 - 13.00 Population size estimation using two or more registrations
(with short breaks)
14.00 - 16.00 Population size estimation using one registration (with
short breaks)
16.00 - 17.00 Discussion and conclusion
For any enquiries please contact: Deborah Stewart, PSC secretary,
[log in to unmask]
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|