PhD STUDENTSHIPS, MRC BIOSTATISTICS UNIT, CAMBRIDGE
The BSU is an internationally recognised research unit specializing in
statistical modelling with application to medical, biological or public
health sciences. Details of the work carried out in the Unit appear on
our website.
The Unit usually offers 4 MRC Studentships per academic year. Full
awards cover Cambridge University fees and a stipend of at least 13,500
pounds for a period of 3 years only. Applicants must have or expect to
get a first or high 2.1 honours degree in mathematics, statistics or a
related discipline. A masters degree is desirable but not essential.
Successful candidates who do not have, or expect to have, a masters
degree may be enrolled on a relevant Masters course. Examples of some of
the projects available for PhD study are given below. We will also
consider suitable projects as suggested by prospective PhD candidates.
Funding
• Full research council funding can be awarded to UK citizens and
applicants who satisfy UK residency criteria as specified by the Medical
Research Council.
• Fees only are provided for citizens of EU countries. Any potential EU
applicants who are unsure as to their eligibility for a MRC Studentship
should contact the Postgraduate Administrator in the first instance
(details below).
• Non-EU applicants who do not satisfy residency criteria are encouraged
to apply but will not be eligible for any MRC funds from the Unit and
are requested to provide their own funding to meet the full costs of a
University of Cambridge PhD.
• Scholarships are available from the University of Cambridge and
applicants who are not eligible for MRC funding are encouraged to apply
for these (please note that this is now closed for the current academic
year).. The university course code you should use is NUBI22.
• Applicants for these should also apply to the Unit directly as below.
However, please note that applicants not meeting the MRC eligibility
criteria for full funding will be asked to provide evidence that they
have either:
o applied for alternative funding
o or will be able to meet the full costs of a University of Cambridge PhD
• Studentships are for full time study only and are available for
commencement in October 2012 only at this time
How to apply
All applicants should send CV, covering letter, detailed list of all
statistics courses taken(including grades if available) and contact
details of 2 academic referees to the Postgraduate Administrator at the
Biostatistics Unit.
E-mail applications are acceptable to [log in to unmask]
Closing date for applications is 31st January 2012.
Interviews are expected to be held on 27th February 2012
*********************************
*PROJECT LIST*
Statistical issues in the design and analysis of multi-arm multi-stage
(MAMS) clinical trials
Supervisors: Jack Bowden and James Wason
The drug development process is extremely long and costly. Techniques
that allow improvement in efficiency are of keen interest to
pharmaceutical companies and public research institutions. When multiple
treatments are available for testing that treat the same condition, the
traditional approach is to test them one-by-one in a series of trials,
each one with a separate control group. A multi-arm trial simultaneously
tests a set of new treatments against a shared control group, thus
requiring fewer patients. In addition, interim analyses can be included
that allow early stopping of treatments if they are not effective, or
early stopping of the trial if an effective treatment is found. A trial
with multiple arms and interim analyses is called a multi-arm
multi-stage (MAMS) trial; they are a recent innovation but many
statistical issues remain, both in their design and analysis.
Firstly, accurate estimation of a treatment's effect is vital for
expressing its true worth. The standard maximum likelihood estimate
ignores the sequential nature of a multi-stage trial, and can exhibit
severe bias. Several alternative classes of estimation procedures have
been proposed to address this issue. However, their application to MAMS
designs is not immediate; most of the methods to date have only focused
on two-stage designs and none consider multiple treatments at the end of
the study.
Secondly, it is often necessary to collect information on multiple
outcomes (or 'endpoints') in a clinical trial. For example, in certain
disease areas such as cancer, toxicity is common and it is desirable to
ensure that a new treatment is both more effective and not significantly
more toxic than an existing treatment. In other areas, such as mental
health, there may be several possible available endpoints, and no
obvious choice for a primary endpoint. Limited literature exists for
multiple endpoints for traditional group-sequential studies, but not
when there are multiple treatment arms too.
This project will therefore focus on the development of statistical
methodology in these two key areas. Firstly, for bias adjusted
estimation of treatment effects in MAMS trials and secondly, for the
analysis of MAMS trials with multiple endpoints.
We are looking for a student with a keen interest in developing
statistical methodology useful for and in the context of real-life
applications. This project will involve collaborating with clinical
experts and clinical trialists with the results being of interest to
clinical trials units currently designing and conducting MAMS trials
(for example in London and Leeds).
*****************************************
Choosing timescales in survival analysis of observational cohort studies
Supervisors: Ian White and Ruth Keogh
In an observational cohort study, individuals- usually of various ages–
are observed from a baseline time until they experience a clinical event
or are censored. Such studies are commonly analysed using a survival
model, but there is debate over whether time since baseline or age is
the most appropriate time scale to yield parsimonious models for the
data. This project aims to develop and implement methods to choose
between the two time scales, depending both on the aim of the analysis
and the nature of the data.
We have access to a large resource, the Emerging Risk Factors
Collaboration (ERFC) database of over 2 million individuals in over 100
cohort studies, which we aim to use for this research. Methods will be
developed theoretically and will be evaluated both by simulation and by
application in the ERFC. The student will therefore develop their
statistical and computing skills and will learn to apply the findings of
their research in an epidemiological context through our close working
links with the ERFC co-ordinating centre. Ideally, general guidelines
would emerge about how to select a suitable timescale for analysis of
such studies.
*******************************************
Misspecification in multi-state models and its impact in the
longitudinal analysis of quality of life measures
Supervisors: Brian Tom and Vern Farewell
Multi-state models have proved very useful for the analysis of
longitudinal data that arise in medical studies. In this project we
would like to assess broadly the potential effect of misspecification of
such models, initially restricted to Markov models. The work is
motivated by use of the models in various rheumatological applications,
and will be applicable quite generally, but the practical application of
the results will focus on the longitudinal analysis of quality of life
measures in rheumatology, particularly in patients with psoriatic arthritis.
This project is envisaged to encompass, in broadly equal measure, both
technical methodologic investigations and substantive medical applications.
******************************************
Choosing sensitivity analyses to the missing at random assumption in
epidemiology
Supervisor: Ian White
Missing data are common in statistics, and analyses are commonly
performed under the assumption that the data are missing at random.
Because this is an untestable assumption, it is important to perform
sensitivity analyses to departures from this assumption.
This project will focus on identifying which sensitivity analyses are
most important. In epidemiological studies, we typically explore the
association between an exposure and an outcome adjusting for
confounders. The project will separately consider missing values in
exposures, outcomes and confounders. For example, we already know that
for a missing outcome, if the data are missing not at random so that the
probability of the outcome being missing depends only on the outcome
itself, then bias is generally small; but if the probability of the
outcome being missing also depends on the exposure, then bias can be
larger.
The main aims of the project will be:
To explore what sort of missing not at random mechanisms lead to
important bias, considering separately the cases when missing data are
in the outcomes, exposures and confounders: and hence to make
recommendations about how sensitivity analyses could be conducted.
To consider ways to perform the sensitivity analyses, probably in the
context of multiple imputation.
To explore how the methods work in several data sets which may include
the EPIC study of risk factors for cancer, the PLAO study in community
mental health, and studies in cardiovascular disease.
We will start by considering missing data in only one variable, but
where possible we will move on to allowing for missing data in all
variables, and also in a repeatedly measured outcome. If time permits, a
further research direction would be to explore ways for working with
clinical experts to determine the plausible degree of departure from the
missing at random assumption, possibly leading to a Bayesian approach
with informative priors.
The student will join a national group of statisticians interested in
missing data problems, which meets regularly in London for informal
discussions.
****************************************
Statistical methods for investigating the genetic regulation of gene
expression
Supervisor: Sylvia Richardson (Collaboration with Prof John Todd
(Cambridge Institute for Medical Research, CIMR))
Whole genome association studies have opened up the possibility of
identifying previously unsuspected genes that are involved in the
aetiology of disease such as cancer and diabetes.To better understanding
on the function of genes and to investigate the genetic regulation of
transcription, an important study design, referred to as eQTL
(expression Quantitative Trait) studies, crosses the information of two
large data sets, genetic markers and gene expression profiles.
Statistically, we are seeking to uncover patterns of associations
between a large number q of “responses” (the gene expression profiles)
and a very large number p of “predictors” (the genetic polymorphisms),
measured simultaneously on a number n of individuals.To go beyond simple
univariate tests of associations, which are hampered by problems of
multiple testing and difficulty of interpretability due to the
correlated structure of both predictors and responses, a number of
multivariate methods specifically tailored to eQTL studies have been
proposed. The broad statistical framework for such approaches is that of
the so-called large p small n paradigm, where the number of observations
n is far smaller than the number of predictors p, precluding the use of
standard regression techniques. This area of statistical research has
been growing rapidly, in particular to answer challenging statistical
questions created by the need to analyse efficiently the vast data sets
arising from more and more sophisticated biotechniques.
The aim of the PhD is to build on recent developments in the statistical
analysis of large data sets, and in particular on the Bayesian framework
for the analysis of eQTL studies (Bottolo et al, 2011) to develop, make
operational and test a number of approaches suitable for the joint
analysis of two large genomics data sets, a commonly encountered task in
integrative genomics studies, as the latter focus on biological
questions related to the combined analysis of any two or more types of
genomics data sets. The main statistical developments will be carried
out within a Bayesian hierarchical framework, and inference will be
performed by means of stochastic algorithms. The models and methods
developed will be first evaluated on benchmarks data sets publicly
available. They will then be applied to improve the understanding of
molecular processes involved in the development of Diabetes and
Inflammation, in collaboration with the group of John Todd (JDRF/WT
Diabetes and Inflammation Laboratory, CIMR). The data for this project
consists in (i)Genotyped data using Immunochip, an Illumina Infinium
genotyping chip containing over 150,000 SNPs from all the confirmed
immune disease regions, and (ii) Affymetrix gene expression arrays on
the same 500 individuals. An eQTL analysis of this data set is planned,
which will involve the incorporation of prior information and
appropriate dependence structure into the models.
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|