IMPERIAL COLLEGE OF SCIENCE, TECHNOLOGY AND MEDICINE
DEPARTMENTS OF EPIDEMIOLOGY & PUBLIC HEALTH [EPH] AND
MEDICAL STATISTICS & EVALUATION [MSE]
MRC Research Studentship in Epidemiology, Biostatistics and Social
Statistics: one studentship available. To apply, send CV to Simon Sheffield
[ [log in to unmask] or fax: 0171 402 2150] indicating the project(s) of
interest to you. For informal discussion, please contact project supervisors
as indicated below.
The Departments’ focus includes leading-edge work in epidemiology, spatial
statistics and Bayesian statistical methods, medical statistics, statistical
computing, clinical trials and meta-analysis. There is considerable
potential to add to the list of possible projects below, and candidates may
submit proposals relevant to the Departments’ work in discussion with an
appropriate supervisor. The studentship will commence in January 2000.
SUMMARY OF POSSIBLE PhD PROJECTS, TO START JANUARY 2000
· Studies of exposure and health effects near landfill sites [EPH]
· Determination of the relationship between maternal haemaglobin, mean
corpuscular volume and birthweight and differentiation of anaemia from
plasma volume expansion [EPH]
· Heterogeneity in couple fertility: the use of frailty models [EPH]
· Bayesian methods for modelling variation in health indicators [EPH]
· Lifetime biological and social risk and protective factors in prediction
of adult health behaviour (smoking, drinking) [EPH]
· Early childhood health as a predictor of adult health [EPH]
· Transformation in the analysis of hierarchical medical data, with focus on
fetal monitoring [MSE]
· The detection, modelling and presentation of interactions with a
continuous covariate in medical and epidemiological studies [MSE]
· Comparison of methods for analysing repeated measurement data and problems
due to the presence of missing observations [MSE]
· Issues in the construction and use of fetal size and dating charts in
obstetrics, with special reference to inter-centre differences [MSE]
· Evaluation methods of genome mapping [MSE]
PROJECT DESCRIPTIONS
1. Studies of exposure and health effects near landfill sites - a number
of projects are possible in this area and candidates should contact Dr Lars
Jarup for an informal discussion (email: [log in to unmask] ; phone: +44 [0]
171 594 3337).
2. Determination of the relationship between maternal haemaglobin, mean
corpuscular volume and birthweight and differentiation of anaemia from
plasma volume expansion (Dr T Kold-Jensen) [EPH] (phone: 0171 594 3335,
email: [log in to unmask])
Objectives: To determine the relationship between maternal haemaglobin (Hb),
mean corpuscular volume (mcv) and birthweight and differentiate anaemia from
plasma volume expansion.
Background: Pregnancy produces plasma volume expansion and haemaglobin
concentration falls accordingly. Mcv however does not change substantially.
A drop in Hb with a large fall in mcv causes anaemia, whereas a drop in Hb
with little or no fall in mcv is caused by plasma volume expansion. Severe
anaemia (<80g/l) is associated with the birth of small babies from both
pre-term labour and growth restriction) but so is high Hb (>120 g/l). The
incidence of low birthweight and of pre-term labour is at its lowest when
the haemoglobin concentration is between 95 and 105 g/l but to date there
have been no studies that have taken into account changes in mcv.
Design/Approaches: The majority of obstetric units in the North West Thames
region have used St Mary’s Maternity Information System (SMMIS) since 1988
to record clinical information throughout pregnancy and delivery. This
database contains information on birth weight and gestational age and now
comprises more than 400,000 pregnancies and births. Full blood counts are
recorded on computer at the pathological departments and download from these
systems will be obtained and linked to the SMMIS data set.
Training: The PhD student will gain experience in analysing large datasets
with repeated events (some women have many blood samples drawn) and
modelling variables. Also the student will obtain the downloads and link the
two databases.
3. Heterogeneity in couple fertility: the use of frailty models (Dr M
Joffe) [EPH] (phone: 0171 594 3338, email: [log in to unmask])
The fertility of a variety of populations has been characterised, using as a
measure the time taken by a couple to conceive (Time To Pregnancy, TTP), and
a remarkably large degree of between-couple heterogeneity has been found. On
evolutionary grounds, it is difficult to explain why a substantial
proportion of the population, with no obvious disease or other health
problem, have low per-cycle probability of conception. Furthermore, the
semen quality of men is remarkably poor when compared with that of other
mammalian species. It is unclear whether these phenomena are linked, and if
so, whether they are of recent historical origin, and/or whether there are
major differences between different populations e.g. according to genetic or
nutritional differences. The possibility of a fall in the sperm count, which
appears to have occurred in certain places (e.g. Paris, Gent), needs to be
seen in this context.
This project will explore between- and within-couple fertility in a large
survey, the National Child Development Study, which is representative of the
population born in Britain in 1958. Respondents who were interviewed at age
33 provided values of TTP for almost all non-accidental pregnancies (91
percent of female and 84 percent of male respondents, N=3132 and 2576,
respectively).
These data will be analysed using frailty models, which introduce a
couple-specific element (a random effect) to acknowledge between-couple
differences. The distribution of these random effects may then depend on
characteristics of the couple. The effect of these characteristics on
relative fertility will be explored and quantified, and implications for
models of fertility in the population will be assessed.
4. Bayesian methods for modelling variation in health indicators (Dr N
Best) [EPH] (phone: 0171 594 3320, email: [log in to unmask])
Health 'indicators' based on hospital admissions data vary inherently over
space and time. The interpretation of such variation is complex, and may
reflect: (a) associations between the outcome of interest and measurable
explanatory factors/confounders such as age, sex, socio-demographic factors
and hospital effects; the latter may arise due to differences in e.g. number
of beds, speciality, coding practice, data quality and completeness. Such
relationships may involve non-linear, possibly discontinuous associations
and interactions between the variables. (b) variation induced by dependence
of the outcome on unobserved or unmeasured factors, such as an unknown
environmental pollutant. These factors will typically vary smoothly in space
and time, thus inducing spatial/temporal correlation in the observed
outcome. (c) Residual chance variation.
Recent developments in Bayesian computational techniques (specifically
Markov chain Monte Carlo methods) have allowed the realistic modelling of
complex problems. However, there has been limited serious use of these
methods to answer questions of substantive importance. This project thus
aims to develop and extend such Bayesian methods to: a) realistically model
the relationship between health indicators and relevant measured predictors;
b) provide flexible models for capturing spatial/temporal correlation in
such data; c) provide techniques for model criticism/selection which will
enable genuine explanatory associations to be distinguished from spurious
variation arising by chance.
The methodology will be applied to a range of health indicators derived from
relevant post-coded databases held by the department, including the Hospital
Episode Statistics.
5. Lifetime biological and social risk and protective factors in prediction
of adult health behaviour (Dr Marjo-Riitta Jarvelin) [EPH] (phone: 0171 594
3345, email: [log in to unmask])
Aims are:
· To explore the effects of childhood social standing considering biological
modifying factors, and own health behaviour in adolescence, on nutritional
factors (e.g. obesity), physical fitness, smoking and drinking up to age
30 years and to study variations by populations/population groups.
· To explain the association of health behaviour and lifetime biological
and social risk and protective factors with the self-assessed health,
health related quality of life and with inequalities in health until the age
of thirty, by population groups (gender, social class, marital, employment
status, country). We will use the opportunity offered by social differences
between countries to compare influence of social factors on health and
health behaviour.
Populations and data:
The study consist of the two national datasets, one from Finland and one
from Britain, representing the sme generation and have similar ages of
follow-up (pregnancy, birth, at 1 or 2 years , teenage years 13-16 years and
31-33 years). The Finnish study (Cohort 1966) consists of 12231 births in
1966. Data collection started in the 24th gestational week on social
background, health and pregnancy and delivery and in the British Cohort for
(n=17733) data has been collected from medical records, by questionnaires
and interviews. In the questionnaires of 31 years follow-up in 1997-8 for
Cohort 1966 the comparability with existing British database was taken into
account. Data has also been collected from various registers and the
existing datafiles comprise thousands of variables. Outcomes at age 30:
nutritional status (weight, height, waist-hip measurement, body mass index),
drinking, smoking, physical fitness and self-assessed health, health related
quality of life measure, social status at 30 (education, income, household
facilities, marital status, employment). Explanatory variables at various
ages: maternal smoking, parental education and social standing at birth and
at 14, parental health behaviour at subject’s age of 14, subject’s smoking
and drinking at teenage and body size. Subject’s health at birth, at 1 and
14. We will apply the cumulative social class measure that takes into
account the class of the parents and the subject’s occupational history, to
present occupational (social) class in predicting the differences in current
health and health behaviour.
6. Early childhood health as a predictor of adult health (Dr Marjo-Riitta
Jarvelin) [EPH] (phone: 0171 594 3345, email: [log in to unmask])
Aim:
To study the association between early childhood health and well-being and
adult health (/adolescent health). The key question is that are the people
in lower socioeconomic groups less healthy than the people in higher
socioeconomic groups because they experienced more health problems in
childhood? What is the contribution of health selelction?
Populations and data:
The study consist of the two datasets, from northern Finland, one for 1966
(n=12231) and one for 1985-86 (n=9479). Data collection started in the 24th
gestational week on social background, health and pregnancy and delivery.
Data has been later collected from medical records, by questionnaires,
interviews and clinical examinations. Data has also been collected from
various registers and the existing data files comprise thousands of
variables. Outcomes at age 30: self-rated health, both hospital and
non-hospital treated diseases. Explanatory and confounding variables at
various ages: health variables since birth (to some extent prenatally) and
early variables (metal disorders, neurological disabilities, other long term
diseases, living conditions, health behaviour.
7. Transformation in the analysis of hierarchical medical data, with a
focus on fetal monitoring (Professor P Royston) [MSE] (phone: 0181 393 3255,
email: [log in to unmask])
Datasets with a hierarchical or multilevel structure are increasingly
important in medicine. Examples include growth curves, cluster-randomised
trials, multi-period clinical studies and observational studies with
repeated measures. The multilevel random-effects model is the analytical
tool of choice. When one or more predictors are continuous, appropriate
regression models are needed. Polynomials are almost invariably chosen, but
they are often inadequate. The project will explore the use of fractional
polynomials in multilevel modelling. These involve transformations of the
predictors and offer greater flexibility and parsimony than ordinary
polynomials. Issues such as how to detect and deal with heterogeneous curve
shapes will be explored. Secondly, for continuous outcome variables, the
multilevel model assumes a Gaussian distribution for relevant parameters.
Transformation of the response variable may be needed to satisfy this
condition. However transformation affects all aspects of the model,
including the shapes of the response curves and their heterogeneity and the
distribution of quantities at all levels of the hierarchy. The project will
investigate the effects of response transformation on different parts of the
model. The aim will be to develop techniques which will help the analyst
decide whether and how to transform the response and understand the effects
thereof. A particular application is the analysis of longitudinal fetal size
data to produce `conditional reference intervals', which are intended to
help the clinician detect fetuses whose growth is faltering. Transformation
of predictor and response variables is needed here. The project will also
consider how best to present the predictions from such models for ease of
understanding and use by the clinician. Several datasets are available to
the project.
8. The detection, modelling and presentation of interactions with a
continuous covariate in medical and epidemiological studies (Professor P
Royston) [MSE] (phone: 0181 393 3255, email: [log in to unmask])
Description of project: In epidemiological studies, covariates which
interact with a risk factor are known as effect modifiers; in clinical
trials, factors which interact with treatment assignment are termed
predictive (of response to treatment). Interactions are scientifically and
clinically important in both contexts. In epidemiology, they crucially
affect the interpretation and generalisability of study results. For
example, a covariate (e.g. age) could be positively associated over a
certain range with disease risk in one subpopulation (e.g. males) and
unassociated in another (e.g. females). In applying results from clinical
trials, identification of predictive factors may help to determine treatment
policy---for example, manage patients with poor prognosis with treatment
policy A, those with good prognosis with policy B.
In practice, interactions are often regarded as a nuisance because they
cloud the interpretation of main predictor effects in the standard additive
multiple linear regression model. Furthermore, the multiplicity of possible
interactions when there are several covariates creates a severe multiple
testing problem (tendency to overfit). In clinical trials, for example,
so-called `subgroup analyses' are much criticised, even notorious. These
problems are present when all the covariates are categorical, but with
continuous covariates additional difficulties arise in modelling the
functional form of continuous/categorical and continuous/continuous
interactions.
The project will explore the modelling of interactions involving one or more
continuous covariates in `noisy' datasets including large epidemiological
datasets (e.g. Whitehall I and Whitehall II) and prognostic factor studies
(e.g. node positive breast cancer), and also in cases where more precise
modelling is possible, such as estimating the gestational age of a fetus
from ultrasound measurements of fetal `biometry'. The aim of the project
will be to develop practical techniques and recommend approaches to
modelling interactions in real data. Questions that may be addressed by the
research include the following. For continuous/categorical interactions, how
should one choose the functions with which to model the relationship at the
different levels of the categorical predictor? How should
continuous/continuous interactions be modelled---for example, is it useful
to categorise one of the covariates first? What role should parametric
models such as fractional polynomials and non-parametric approaches such as
generalized additive models (GAMs) play? It is possible to express the size
of an interaction effect relative to a main effect such that its importance
can easily be judged?
The approach will be to consider particular possible interactions such as
between smoking and age or cholesterol and age in modelling the risk of a
heart attack, and between treatment with tamoxifen and the oestrogen or
progesterone receptor status of a breast tumour. Models for these cases will
be developed. The experience generated will be used to suggest more general
approaches and to inform realistic simulation studies where the `truth' is
known. The aim of the simulations will be to evaluate the effectiveness of
the different methods. Techniques for presenting results from the better
methods will be worked up. Finally, the methods will be applied to other
datasets.
9. Comparison of methods for analysing repeated measurement data and
problems due to the presence of missing observations (Dr. Rumana Omar) [MSE]
(phone: 0181 393 3255, email: [log in to unmask])
A variety of methods are available for the analysis of repeated measurement
data such as the use of summary statistics, hierarchical random effects
models and marginal models. We have recently compared various methods for
repeated measurement analysis for the case of continuous outcomes from a
clinical trial. Some methods are relatively simple to use, but the other
more complex methods have greater flexibility. Classical multilevel models
impose more restrictive distributional assumptions compared with Bayesian
and marginal models. Issues in statistical analysis tend to be more
complicated for discrete outcome data. Furthermore, observations may be
missing at intermittent times and the presence of missing observations can
make the analysis more problematic. If in particular the reason for the
missingness of an observation is related to the outcome variable being
investigated, it complicates both the analysis and the interpretation of the
data. The various methods available for the analysis of repeated measurement
data differ in their sensitivity to missing data. Some methods require
‘missing completely at random’ (MCAR) assumption, whereas others require
‘missing at random assumption’ (MAR). In practice it is possible that data
are not missing at random (NMAR).
This project has two primary objectives. It will examine how the various
methods available for the analysis of repeated measurement data compare in
terms of flexibility, ease of application, interpretation and underlying
assumptions for both continuous and discrete outcome data. A number of data
sets both from clinical trials and longitudinal observational studies will
be used for this purpose. The project will also explore methods for
investigating the patterns of missingness in the data, that is whether the
missing data mechanism is MAR, MCAR or NMAR, and propose strategies to deal
with missingness in the analysis.
10. Issues in the construction and use of fetal size and dating charts in
obstetrics, with special reference to inter-centre differences (Professor
Royston) [MSE] (phone: 0181 393 3255, email: [log in to unmask])
Description of project: In recent years, ultrasound scanning of the fetus
has become standard practice in the management of pregnancy in the West.
Scanning has several purposes, including estimation of gestational age,
estimation of fetal size relative to gestational age and the antenatal
detection of anomalies such as Down's syndrome. The aims include to predict
the expected date of delivery (EDD) for clinical reasons and to inform the
mother, to adjust certain other measurements such as serum alphafetoprotein
concentration (used in screening for Down's syndrome and neural tube defect)
for gestational age and to assess fetal growth. For clinical application,
two kinds of graph are produced: `dating charts' which predict gestational
age from a fetal dimension such as femur length or biparietal diameter, and
`growth charts' (properly: `size charts') which give reference centiles of
fetal dimensions according to gestational age. Many such charts have been
published.
In practice, obstetric units tend to use one or other of a small number of
published charts, though some units generate and use their own. It is known,
however, that fetal size and hence the relevant charts are influenced by a
number of intrinsic (biological) covariates, including ethnicity, parental
size, parity, sex of fetus, and practical factors such as measurement
techniques, experience and skill of the operator, clinical indications for
the scan, etc. The result is variation between centres and in the inferences
that are drawn from the charts. The first aim of the project is to quantify
this variation and where possible to model it using some of the covariates
just listed. The purpose is to determine whether it is sensible to use one
or a very small number of fetal size charts (with recognised limitations),
or whether it is necessary to produce separate charts, for example for
different ethnic groups. Existing regression methods based on fractional
polynomials will be used to estimate the relevant models.
The second aim is to investigate statistical methods for constructing dating
charts. For example, one approach which has been used is to regress
gestational age on fetal size, but since gestational age may not be a random
variable (for example, measurements may be collected for predefined
gestational ages), it is unclear if the approach is valid or efficient. The
project will investigate several approaches with different study designs and
recommend those with the best performance.
The project will use several fairly large databases on fetal growth from
obstetric units in the UK\ and Europe and from research studies. The
approach will be to construct models for fetal size given gestational age
and to include relevant covariates where possible. Multilevel modelling
techniques will be used to investigate between-centre heterogeneity. It may
be possible to construct charts based on very large samples which
accommodate unexplained heterogeneity. If appropriate, such charts will be
published in subject-matter journals.
Dr Lyn Chitty (University College Hospital, London), a clinician with
considerable practical and research experience in fetal monitoring and a
co-author of some of the `standard' fetal charts, has kindly agreed to
facilitate, advise and collaborate in the project.
Supervisor's comments: this project offers the student an opportunity to
work on an important and relevant area of applied statistical modelling
within an expanding research environment affording collaborative and
methodological research experience. Statistical modelling is a major
research interest of the department. The student will contribute to the
continuing development of an important topic which has the potential to
affect clinical practice. The techniques required
will include methodological understanding of regression and multilevel
modelling, development of computational solutions, analysis and presentation
of practical datasets and a focus on the practical issues and needs of users
of fetal charts. It will be necessary to consider longitudinal aspects where
several measurements are available for the same fetus. The approach will
mainly be by detailed investigation of several large datasets with the aim
to synthesise and understand between-centre differences where possible.
11. Evaluation methods of genome mapping (Dr Berthold Lausen)[MSE] (phone:
0181 393 3255, email [log in to unmask])
The construction of a genetic or a physical map is a basic goal of all
genome projects. The underlying mathematical/computational challenge can be
seen as a variant of the travelling salesman problem. The aim is to find the
order (of visits) of the genetic marker loci which
minimises the sum of the distances. Consequently, optimal solutions are not
feasible for relative large numbers of marker loci.
The PhD-project aims to improve the statistical theory and to address
several important data analytic issues. The project will be collaborative
with Tim Aitman of the Molecular Medicine Group, MRC Clinical Sciences
Centre, Hammersmith campus). It will focus on radiation hybrid (RH) mapping.
One RH example is the data set of the Insulin Resistance Team (headed by Dr
Tim Aitman) (Al-Majali, K.M., et al., 1999, A high-resolution radiation
hybrid map of the proximal region of rat chromosome 4, Mammalian Genome 10,
471-476).
For example important data analytic issues of ongoing research projects are
to model the measurement error caused by possible PCR failure or weak
positive results or to estimate and model jointly linkage groups of the RH
data and information of genetic maps. The development of evaluation methods
of constructed and published genome maps is an important issue
of genome and post genome projects. Recently bootstrap methods are suggested
to provide a measure of stability. The PhD-project will address the data
analytic issues and will develop and analyse bootstrap evaluation by means
of Monte-Carlo simulation and mathematical considerations.
The project is an excellent opportunity to collaborate in an important and
fast developing research area. Moreover, the project provides research
experience in highly important application fields of statistical genetics
and bioinformatics.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|