Print

Print


The MRC Biostatistics Unit (MRC BSU
<http://www.mrc-bsu.cam.ac.uk/news-and-events/bsuseminars/> ) has the
following exciting PhD opportunities, starting October 2017.



Please circulate to those who you think this may be of interest to.



Full information, including details on how to apply are at:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/



Deadline for applications is the 8th January 2017.




Developing stratified approaches from randomised trials, with application to
recommended intervals between blood donations


Supervisors - Brian Tom
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/t-to-z/brian-tom/
>  (MRC BSU) and Simon Thompson
<http://www.phpc.cam.ac.uk/people/ceu-group/ceu-senior-research-staff/simon-
thompson/>  (University of Cambridge)

Larger randomised trials offer the potential not only to estimate the
overall effectiveness of alternative treatments or policies, but also to
explore which types of subject may benefit most. However, the statistical
methods for addressing the latter issue are not well developed, especially
when there is a wealth of information on patient characteristics that could
be used.

The project will be based on the very large INTERVAL trial
(www.intervalstudy.org.uk), in which 50,000 male and female blood donors
have been randomised to giving blood at, or more frequently than, the
standard intervals (8 and 10 weeks vs the standard 12-weeks for men, and 12
and 14 weeks vs the standard 16-weeks for women). The outcomes in the trial
are the amount of blood collected over the two years of the trial, the
number of deferrals (temporary rejection of a donor due to low haemoglobin),
and quality of life (in particular the physical subscale of the SF-36
questionnaire). In addition to the overall comparison of randomised groups,
interest centres on whether different inter-donation intervals should be
recommended for people with different characteristics (e.g. by age, weight,
blood biomarkers, or genetic characteristics). The 50,000 trial participants
are well characterised at baseline (demographics, previous donation history,
haematology, iron measures, genetics, quality of life), and at two years,
with interim 6-month questionnaires on quality of life and health symptoms.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Bayesian dose adaptive trials using non-myopic response-adaptive methods


Supervisors - Sofia Villar
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/t-to-z/sofia-vill
ar/>  (MRC BSU) and Adrian Mander
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/h-to-m/adrian-man
der/>  (MRC BSU)

In dose finding studies the aim is to find the maximum tolerated dose of an
agent or to find a dose which is closest to a target dose.  In dose-ranging
studies different doses of an agent are tested against each other to
establish which dose works best and/or is least harmful by estimating a
response-dose relationship. However, achieving either of these goals with a
high precision can imply exposing a large number of patients to highly toxic
doses, imposing a learning-earning trade-off. Despite extensive recent work
has been done in using decision theory for addressing such a trade-off in
the context of designing clinical trials [1], little work has been done to
extend such a framework for dose-finding/dose-ranging studies. Using a
decision-theoretic approach allows to take into account the interests of the
patients both within and outside the trial to derive a patient allocation
rule which can acknowledge the existing conflict between the interests of
each individual patient and the following patients. This idea was proposed
earlier in the literature (e.g. a framework for dose-finding trials using
the theory of bandit problems was proposed by Leung and Wang [2]) yet
because finding the optimal strategy for this type of bandits with dependent
arms is in most relevant cases not computationally feasible the approach has
not been further developed.

This PhD project will look at developing decision-theoretic non-myopic
response-adaptive dose-ranging methodology for dose-ranging and dose-finding
studies. The project will make use of recent advances in bandit theory to
try and reduce the computational complexity of finding the optimal (or
nearly optimal) solution derived from a set of relevant optimisation
problems.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Developing Bayesian non-myopic response-adaptive randomisation for the case
of delayed endpoint observation


Supervisors - Sofia Villar
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/t-to-z/sofia-vill
ar/>  (MRC BSU) and Adrian Mander
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/h-to-m/adrian-man
der/>  (MRC BSU)

Before a novel treatment is made available to the wider public, clinical
trials are undertaken to provide unbiased and reliable information that the
treatment is safe and efficacious. The standard approach for such
confirmatory clinical trials is to compare only two treatment options and
requires a large number of patients to be recruited in a trial. This
approach does not fit well with the development of treatment for many
conditions in which there is a large number of potential treatments to
explore and relatively very few patients affected by the disease that could
be enrolled in a trial. This is the case for drug development for rare types
of cancer.

A promising alternative to the standard approach within the above described
context is the use of response-adaptive randomization (i.e. changing the
allocation probabilities as outcome data is collected to favour promising
treatments). Promising treatments can be quickly identified, allocating more
patients to them while doing so, by designing a trial that incorporates a
response-adaptive randomization patient allocation rule. The type of
response-adaptive randomization rules that exhibit the best performance in
terms of patient benefit are the so called non-myopic rules which
unfortunately suffer from a binding computational burden. Developing
computational feasible and practical methods to apply these ideas into trial
design as a way for improving the success rate of Phase III clinical trials
are therefore of great current interest. At the Biostatistics unit we have
made a start with this by developing a non-myopic group response-adaptive
randomisation method called the 'forward looking Gittins index' rule (1,2)
for the case of dichotomous endpoints.

This PhD project will look at extending existing non-myopic
response-adaptive randomisation methodology to cover the case of delayed
outcomes. This is particularly relevant for trials in which the endpoint is
survival.  The project will investigate novel optimal adaptive designs that
can use both observed response and partial information (derived from the
delayed response). Therefore, these methods will be closer to the real world
situations being handled by trials in which the endpoint is not necessarily
best modelled as binary and immediately observable.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Integrative methods for identifying non-coding rare variants responsible for
rare diseases


Supervisors - Ernest Turro
<http://platelets.group.cam.ac.uk/people/ernest-turro>  (Department of
Haematology, University of Cambridge) and Sylvia Richardson
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/n-to-s/sylvia-ric
hardson/> (MRC BSU)

Only half of the approximately 7,000 known rare heritable disorders of
humans have an established molecular basis. These genetic determinants have
been identified through linkage studies and, more recently, by uncovering
associations between genetic variants identified through genomic DNA
sequencing and disease phenotypes encoded as simple variables (e.g.
case/control label). Recently, we have developed a regression method for
identifying associations between rare variants in genes and Human Phenotype
Ontologoy (HPO)-coded patient phenotypes (Greene et al, Am. J. Hum. Genet.,
2016). This method allows modeling of phenotype abnormalities that encompass
all organ systems and which are encoded with a variable degree of clinical
detail - a common feature of the phenotypes of patients with rare diseases.
Currently, we are developing Bayesian methodology for modeling candidate
rare variants (e.g. within a region) as mixtures of pathogenic and
non-pathogenic rare variants in the context of the typical modes of
Mendelian inheritance.

The vast majority of variants identified so far alter the protein products
of genes, which comprise around 2% of the genome. This is partly because the
effects of variants in protein-coding genes are more easily predicted than
those outside of coding regions and partly because it has not been possible,
until now, to sequence entire genomes cheaply and with high accuracy. As a
high proportion of cases remain unexplained, it is commonly postulated that
variants affecting gene regulation but residing outside genes themselves may
underlie such disorders. Identifying these variants will require careful
integration of relevant cell-specific and population genetic data to inform
probabilities of pathogenicity of non-coding variants.

The aim of the proposed project is to develop innovative statistical methods
for uncovering associations between rare variants and rare Mendelian
diseases that make use of relevant epigenetic, chromosomal conformation,
protein-protein interaction, eQTL and GWAS data, and apply them to a rich
database of blood-related disorders. It is only through appropriate modeling
of various layers of genomic and genetic information that elusive causes of
inherited disorders are likely to be found. Methods for integration of
multi-omics data are at an early stage of development and this project will
build on the experience of both teams in the domain of rare disease analysis
and statistical genomics, notably using Bayesian modelling strategies. The
successful candidate will have access to extensive computing facilities at
the University's high performance computing cluster and be engaged in the
largest rare disease research programme in Europe
(https://bioresource.nihr.ac.uk/rare-diseases/welcome/). Initial focus will
be on diseases of the blood stem cell and its progeny. Several thousand
cases with a blood-related disorder have been sequenced and phenotyped and
we have access to deep epigenetic and chromosomal conformation data from all
the major mature and progenitor cells in blood, as well as the results of
blood-trait GWAS and blood cell eQTL studies. These data will assist in the
development and assessment of emerging methodological ideas. In
collaboration with colleagues in other institutions and within the
Department of Haematology, potential findings will be amenable to rapid
follow-up in the laboratory.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Bayesian methods for "weighted" biomedical data


Supervisor - Robert Goudie
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/a-to-g/robert-gou
die/>  (MRC BSU)

The recent availability in biomedical studies of vast quantities of data,
such as omics (genomic, transcriptomic, proteomic etc) data, is starting to
enable data-driven "precision medicine". This approach to medicine aims to
use these new data to allow tailoring of treatments to patients, rather than
the traditional "one size fits all" approach.

However, it is often not feasible to collect data on all relevant
individuals due to, for instance, time and cost constraints. Instead, in
many studies, data is collected on only a subgroup of the relevant
population. In precision medicine studies, the subgroup is often
deliberately chosen to over-represent particularly interesting cases (e.g.
extreme cases) to increase the chances that differences between patients
that require different treatment strategies can be identified. Such a
subgroup is not representative of the overall population, and the results of
a statistical analysis will be distorted unless this is accounted for in the
analysis. To do this, we must account for the "weight" associated with each
observed individual i.e. how many people each observed individual represents
in the full population.

We at the Biostatistics Unit are involved in a number of precision medicine
collaborations that involve weighted data, including studies of Alzheimer's
disease and other dementias. Many promising approaches in precision medicine
take a Bayesian approach to make it straightforward to account for all
sources of uncertainty within large, complex models. However, Bayesian
approaches for weighted data are in their infancy. This PhD project will
develop these methods, with the aim of enabling Bayesian approaches in
precision medicine with weighted data. The methods developed are likely to
be also applicable more widely to the many other sources of weighted data in
biostatistics.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Methods for integrating and splitting complex/big models


Supervisors - Robert Goudie
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/a-to-g/robert-gou
die/>  (MRC BSU) and Lorenz Wernisch
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/t-to-z/lorenz-wer
nisch/>  (MRC BSU)

Synthesis of evidence from multiple sources (data and expert opinion) and
from different study designs is increasingly common in all areas of science,
including in infectious disease epidemiology, health technology assessment
and omics (genomics, proteomics etc). Combining information sources often
results in more precise and useful inferences, especially when some data are
incomplete or biased.

However, using joint "big models" of several sources of evidence, including
data and expert opinion, is inferentially and computationally challenging.
It is often sensible to take a modular approach in which separate sub-models
are considered for smaller, more manageable parts of the available
data/evidence. Each sub-model is simpler (lower-dimensional) than the "big
model" and so will be easier to construct and use.

In a Bayesian framework, the sub-models should be integrated into a joint
model, so that all data and uncertainty are fully accounted for. This can be
challenging to do, but at the Biostatistics Unit we have recently proposed a
novel approach to this problem called Markov melding [1], building on ideas
from the graphical models literature. This promises to enable fully Bayesian
inference in settings where this was not previously possible, and to allow
splitting the computation required for large models into smaller pieces
(which may be computationally advantageous). However, it remains an open
problem how best to join together these pieces into inference for the joint
model.

This PhD project would particularly suit a student interested in
computational and methodological statistics, since there is considerable
scope for new methodology and algorithms in this area. The PhD will involve
working towards developing, implementing and assessing promising approaches.
There is the potential to draw upon and extend ideas in the connected
literatures that are developing in this area including
divide-and-conquer/parallel computation methods for "big data" (such as
large n "tall data"); newly-developed approximate methods for estimating the
ratio of two densities; pseudo-marginal MCMC; and connections to sequential
Monte Carlo. There is also scope to study the application of these methods
in substantive application areas, including in network meta analysis.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Hybrid probabilistic model integration for -omics data


Supervisor - Lorenz Wernisch
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/t-to-z/lorenz-wer
nisch/>  (MRC BSU)

A goal of biomedical research is to understand cellular processes underlying
regular cell development and factors that can disturb these processes
leading to disease. Increasingly comprehensive experimental data sets are
available that aim at providing a multi-dimensional view on such processes
from different angles such as genetics, genomics, epigenomics,
transcriptomics, or metabolomics (for example the Blueprint project,
http://www.blueprint-epigenome.eu).

Traditionally analyses of such multi-dimensional data are based on a series
of individual analyses for each data level: genetic association studies to
identify genetic variants, which are then fed into an analysis of the
genommic and epigenomic structure, which in turn are fed into further
downstream analysis of gene regulation and protein activities. However,
information is potentially lost at each stage of such multi-step analysis
since there is often little opportunity for feedback from later stages of
the analysis to earlier ones. A probabilistic model comprising all different
stages at once, which would allow information to flow freely between
components, would therefore be desirable.

Traditional Bayesian approaches to a comprehensive model which are based on
the (hierarchical) combination of standard distributions, however, struggle
with the size, complexity and heterogeneity of the data. A potential
solution exists in the combination of a traditional modelling approach with
modelling ideas from Bayesian nonparametrics or machine learning. For
example, some components of the model might be best modelled by
nonparameteric density estimation obtained via kernel methods or deep neural
networks, while other components might be understood well enough to be
modelled by traditional probabilistic methods using standard distributions
and modelling techniques. Inference for such hybrid models poses an extra
challenge since traditional inference methods, such as Monte-Carlo
simulation, need to be combined with training methods from machine learning.

This is a multi-disciplinary project which requires a deep interest in
Bayesian as well as machine learning methods and the willingness to
understand the biological questions and structure of the experimental data
driving the modelling. Some familiarity with Bayesian modelling and
statistical computing is required.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Experimental design for inference of gene networks from single cell data


Supervisors - John Reid
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/n-to-s/john-reid/
> (MRC BSU) and Steven Hill
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/h-to-m/steven-hil
l/> (MRC BSU)

Gene regulatory networks control almost all cellular functions. The ability
to accurately reconstruct these networks would greatly further our
understanding of many diseases, genetic conditions and developmental
biology. However only limited progress has been made reverse engineering
these networks using the data available from modern high-throughput
biological experiments.

The space of all possible undirected network structures grows exponentially
in the number of genes and thus network inference is underdetermined for
networks of any reasonable size. However, the network inference problem is
wellposed in abstraction and this makes it an attractive problem to study.
This low barrier to entry together with its biological importance means that
network inference has been extensively studied over the last two decades.
Many inference methods have been developed that work with various types of
experimental
designs [1].

Perturbation experiments measure a system's characteristics in conditions
other than its natural state. For example in gene knockdown experiments, one
or more genes are artificially silenced. Data from perturbation experiments
are some of the most informative for network inference as the effect of a
small change to the network can be accurately assessed. However they are
expensive and time-consuming to perform and typically biologists can only
perform a handful of perturbations. Usually the perturbed genes are chosen
by the experimenter in an ad hoc fashion. This project will develop methods
for experimental design (that is how to choose which gene(s) to perturb) in
order to maximise the value of information from each experiment. Some work
exists on experimental design in this context [2-7] but in general this
field has not been studied nearly as extensively as the network inference
problem.

Recently techniques have been developed to assay gene expression levels in
individual cells. Previously genome-wide expression levels could only be
measured as averages across populations of thousands of cells. The newly
available single cell data allow us to inspect the correlations and
relationships between genes in fine detail. In particular the between-cell
variation in a population of cells can be characterised. This project will
focus on experimental design for single cell experiments.

Most network inference techniques provide point estimates of the network
structure. This is a reasonable strategy given the difficulty of exploring
the entire space of networks. However to reliably gauge the likely amount of
information gained from any particular experimental perturbation, methods to
estimate correlations and uncertainty in the posterior will need to be
developed.

Given these correlations and uncertainties, methods to choose which genes to
perturb will be explored. It is anticipated that the methods developed for
the will be Bayesian methods as they naturally quantify uncertainty.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/


Model-free network inference from single cell data


Supervisor - John Reid
<http://www.mrc-bsu.cam.ac.uk/people/in-alphabetical-order/n-to-s/john-reid/
> (MRC BSU)

Gene regulatory networks control almost all cellular functions. The ability
to accurately reconstruct these networks would greatly further our
understanding of many diseases, genetic conditions and developmental
biology. However, only limited progress has been made reverse engineering
these networks using data available from modern high-throughput biological
experiments [1].

Recently techniques have been developed to assay gene expression levels in
individual cells. Previously genome-wide expression levels could only be
measured as averages across populations of thousands of cells. The newly
available single cell data allow us to inspect the correlations and
relationships between genes in fine detail. In particular the between-cell
variation in a population of cells can be characterised. It is anticipated
that single cell data will greatly aid the reconstruction of gene regulatory
networks. To date only a few inference methods have been developed
specifically for single cell data [2, 3].

Classical network inference is posed as an network edge prediction task
given a gene-by-sample data matrix of gene expression levels. In this
formulation when the true network is known the predictions can be validated
using precision and recall or other similar statistics [4]. This project
will take an alternate approach and focus on model-free approaches to
modelling such data. By model-free we mean we will use methods that do not
explicitly represent the structure and parameters of the network.

Model-free approaches are the state-of-the-art for modelling certain
physical systems [5]. They are able to accurately learn the dynamics of
complicated systems with no prior knowledge of the physical relationships
between the variables [6]. This project will investigate how to translate
their success learning the dynamics of physical systems to the problem of
learning the dynamics of gene expression. One model-free approach for
regulatory network inference could be to learn the dynamics of the system
using a deep neural network [7] or a Bayesian nonparametric model such as a
Gaussian process dynamical model [8]. In this approach single cell data from
a time series experiment would be placed along a pseudotime dimension [9].
The dynamics of gene expression relative to this pseudotime would be learnt
by the model.

Perturbation experiments measure a system's characteristics in conditions
other than its natural state. For example in gene knockout experiments, one
or more genes are artificially silenced. Data from perturbation experiments
are some of the most informative for network inference as the effect of a
small change to the network can be accurately assessed. We will be
interested in developing model-free methods that can recapitulate the
behaviour of a system under perturbations. Only in this case will we be able
to interrogate the model and confidently infer which regulatory
relationships are present.

More info and to apply:
http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/





Alison Quenault

Communications Officer

MRC Biostatistics Unit

Cambridge Institute of Public Health

Forvie Site

Robinson Way

Cambridge Biomedical Campus

Cambridge CB2 0SR



Tel: +44-(0)1223-768263

Email:  <mailto:[log in to unmask]>
[log in to unmask]

Website:  <http://www.mrc-bsu.cam.ac.uk/> www.mrc-bsu.cam.ac.uk

Follow us on Twitter: @MRC_BSU




You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.