The MRC Biostatistics Unit (MRC BSU) has the following exciting PhD opportunities, starting October 2017.

Please circulate to those who you think this may be of interest to.

Full information, including details on how to apply are at: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Deadline for applications is the 8^th January 2017.

Developing stratified approaches from randomised trials, with application to recommended intervals between blood donations

Supervisors - Brian Tom (MRC BSU) and Simon Thompson (University of Cambridge)

Larger randomised trials offer the potential not only to estimate the overall effectiveness of alternative treatments or policies, but also to explore which types of subject may benefit most. However, the statistical methods for addressing the latter issue are not well developed, especially when there is a wealth of information on patient characteristics that could be used.

The project will be based on the very large INTERVAL trial (www.intervalstudy.org.uk), in which 50,000 male and female blood donors have been randomised to giving blood at, or more frequently than, the standard intervals (8 and 10 weeks vs the standard 12-weeks for men, and 12 and 14 weeks vs the standard 16-weeks for women). The outcomes in the trial are the amount of blood collected over the two years of the trial, the number of deferrals (temporary rejection of a donor due to low haemoglobin), and quality of life (in particular the physical subscale of the SF-36 questionnaire). In addition to the overall comparison of randomised groups, interest centres on whether different inter-donation intervals should be recommended for people with different characteristics (e.g. by age, weight, blood biomarkers, or genetic characteristics). The 50,000 trial participants are well characterised at baseline (demographics, previous donation history, haematology, iron measures, genetics, quality of life), and at two years, with interim 6-month questionnaires on quality of life and health symptoms.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Bayesian dose adaptive trials using non-myopic response-adaptive methods

Supervisors - Sofia Villar (MRC BSU) and Adrian Mander (MRC BSU)

In dose finding studies the aim is to find the maximum tolerated dose of an agent or to find a dose which is closest to a target dose. In dose-ranging studies different doses of an agent are tested against each other to establish which dose works best and/or is least harmful by estimating a response-dose relationship. However, achieving either of these goals with a high precision can imply exposing a large number of patients to highly toxic doses, imposing a learning-earning trade-off. Despite extensive recent work has been done in using decision theory for addressing such a trade-off in the context of designing clinical trials [1], little work has been done to extend such a framework for dose-finding/dose-ranging studies. Using a decision-theoretic approach allows to take into account the interests of the patients both within and outside the trial to derive a patient allocation rule which can acknowledge the existing conflict between the interests of each individual patient and the following patients. This idea was proposed earlier in the literature (e.g. a framework for dose-finding trials using the theory of bandit problems was proposed by Leung and Wang [2]) yet because finding the optimal strategy for this type of bandits with dependent arms is in most relevant cases not computationally feasible the approach has not been further developed.

This PhD project will look at developing decision-theoretic non-myopic response-adaptive dose-ranging methodology for dose-ranging and dose-finding studies. The project will make use of recent advances in bandit theory to try and reduce the computational complexity of finding the optimal (or nearly optimal) solution derived from a set of relevant optimisation problems.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Developing Bayesian non-myopic response-adaptive randomisation for the case of delayed endpoint observation

Supervisors - Sofia Villar (MRC BSU) and Adrian Mander (MRC BSU)

Before a novel treatment is made available to the wider public, clinical trials are undertaken to provide unbiased and reliable information that the treatment is safe and efficacious. The standard approach for such confirmatory clinical trials is to compare only two treatment options and requires a large number of patients to be recruited in a trial. This approach does not fit well with the development of treatment for many conditions in which there is a large number of potential treatments to explore and relatively very few patients affected by the disease that could be enrolled in a trial. This is the case for drug development for rare types of cancer.

A promising alternative to the standard approach within the above described context is the use of response-adaptive randomization (i.e. changing the allocation probabilities as outcome data is collected to favour promising treatments). Promising treatments can be quickly identified, allocating more patients to them while doing so, by designing a trial that incorporates a response-adaptive randomization patient allocation rule. The type of response-adaptive randomization rules that exhibit the best performance in terms of patient benefit are the so called non-myopic rules which unfortunately suffer from a binding computational burden. Developing computational feasible and practical methods to apply these ideas into trial design as a way for improving the success rate of Phase III clinical trials are therefore of great current interest. At the Biostatistics unit we have made a start with this by developing a non-myopic group response-adaptive randomisation method called the ‘forward looking Gittins index’ rule (1,2) for the case of dichotomous endpoints.

This PhD project will look at extending existing non-myopic response-adaptive randomisation methodology to cover the case of delayed outcomes. This is particularly relevant for trials in which the endpoint is survival. The project will investigate novel optimal adaptive designs that can use both observed response and partial information (derived from the delayed response). Therefore, these methods will be closer to the real world situations being handled by trials in which the endpoint is not necessarily best modelled as binary and immediately observable.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Integrative methods for identifying non-coding rare variants responsible for rare diseases

Supervisors - Ernest Turro (Department of Haematology, University of Cambridge) and Sylvia Richardson (MRC BSU)

Only half of the approximately 7,000 known rare heritable disorders of humans have an established molecular basis. These genetic determinants have been identified through linkage studies and, more recently, by uncovering associations between genetic variants identified through genomic DNA sequencing and disease phenotypes encoded as simple variables (e.g. case/control label). Recently, we have developed a regression method for identifying associations between rare variants in genes and Human Phenotype Ontologoy (HPO)-coded patient phenotypes (Greene et al, Am. J. Hum. Genet., 2016). This method allows modeling of phenotype abnormalities that encompass all organ systems and which are encoded with a variable degree of clinical detail — a common feature of the phenotypes of patients with rare diseases. Currently, we are developing Bayesian methodology for modeling candidate rare variants (e.g. within a region) as mixtures of pathogenic and non-pathogenic rare variants in the context of the typical modes of Mendelian inheritance.

The vast majority of variants identified so far alter the protein products of genes, which comprise around 2% of the genome. This is partly because the effects of variants in protein-coding genes are more easily predicted than those outside of coding regions and partly because it has not been possible, until now, to sequence entire genomes cheaply and with high accuracy. As a high proportion of cases remain unexplained, it is commonly postulated that variants affecting gene regulation but residing outside genes themselves may underlie such disorders. Identifying these variants will require careful integration of relevant cell-specific and population genetic data to inform probabilities of pathogenicity of non-coding variants.

The aim of the proposed project is to develop innovative statistical methods for uncovering associations between rare variants and rare Mendelian diseases that make use of relevant epigenetic, chromosomal conformation, protein-protein interaction, eQTL and GWAS data, and apply them to a rich database of blood-related disorders. It is only through appropriate modeling of various layers of genomic and genetic information that elusive causes of inherited disorders are likely to be found. Methods for integration of multi-omics data are at an early stage of development and this project will build on the experience of both teams in the domain of rare disease analysis and statistical genomics, notably using Bayesian modelling strategies. The successful candidate will have access to extensive computing facilities at the University's high performance computing cluster and be engaged in the largest rare disease research programme in Europe (https://bioresource.nihr.ac.uk/rare-diseases/welcome/). Initial focus will be on diseases of the blood stem cell and its progeny. Several thousand cases with a blood-related disorder have been sequenced and phenotyped and we have access to deep epigenetic and chromosomal conformation data from all the major mature and progenitor cells in blood, as well as the results of blood-trait GWAS and blood cell eQTL studies. These data will assist in the development and assessment of emerging methodological ideas. In collaboration with colleagues in other institutions and within the Department of Haematology, potential findings will be amenable to rapid follow-up in the laboratory.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Bayesian methods for “weighted” biomedical data

Supervisor - Robert Goudie (MRC BSU)

The recent availability in biomedical studies of vast quantities of data, such as omics (genomic, transcriptomic, proteomic etc) data, is starting to enable data-driven “precision medicine”. This approach to medicine aims to use these new data to allow tailoring of treatments to patients, rather than the traditional “one size fits all” approach.

However, it is often not feasible to collect data on all relevant individuals due to, for instance, time and cost constraints. Instead, in many studies, data is collected on only a subgroup of the relevant population. In precision medicine studies, the subgroup is often deliberately chosen to over-represent particularly interesting cases (e.g. extreme cases) to increase the chances that differences between patients that require different treatment strategies can be identified. Such a subgroup is not representative of the overall population, and the results of a statistical analysis will be distorted unless this is accounted for in the analysis. To do this, we must account for the “weight” associated with each observed individual i.e. how many people each observed individual represents in the full population.

We at the Biostatistics Unit are involved in a number of precision medicine collaborations that involve weighted data, including studies of Alzheimer’s disease and other dementias. Many promising approaches in precision medicine take a Bayesian approach to make it straightforward to account for all sources of uncertainty within large, complex models. However, Bayesian approaches for weighted data are in their infancy. This PhD project will develop these methods, with the aim of enabling Bayesian approaches in precision medicine with weighted data. The methods developed are likely to be also applicable more widely to the many other sources of weighted data in biostatistics.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Methods for integrating and splitting complex/big models

Supervisors - Robert Goudie (MRC BSU) and Lorenz Wernisch (MRC BSU)

Synthesis of evidence from multiple sources (data and expert opinion) and from different study designs is increasingly common in all areas of science, including in infectious disease epidemiology, health technology assessment and omics (genomics, proteomics etc). Combining information sources often results in more precise and useful inferences, especially when some data are incomplete or biased.

However, using joint "big models" of several sources of evidence, including data and expert opinion, is inferentially and computationally challenging. It is often sensible to take a modular approach in which separate sub-models are considered for smaller, more manageable parts of the available data/evidence. Each sub-model is simpler (lower-dimensional) than the "big model" and so will be easier to construct and use.

In a Bayesian framework, the sub-models should be integrated into a joint model, so that all data and uncertainty are fully accounted for. This can be challenging to do, but at the Biostatistics Unit we have recently proposed a novel approach to this problem called Markov melding [1], building on ideas from the graphical models literature. This promises to enable fully Bayesian inference in settings where this was not previously possible, and to allow splitting the computation required for large models into smaller pieces (which may be computationally advantageous). However, it remains an open problem how best to join together these pieces into inference for the joint model.

This PhD project would particularly suit a student interested in computational and methodological statistics, since there is considerable scope for new methodology and algorithms in this area. The PhD will involve working towards developing, implementing and assessing promising approaches. There is the potential to draw upon and extend ideas in the connected literatures that are developing in this area including divide-and-conquer/parallel computation methods for "big data" (such as large n "tall data"); newly-developed approximate methods for estimating the ratio of two densities; pseudo-marginal MCMC; and connections to sequential Monte Carlo. There is also scope to study the application of these methods in substantive application areas, including in network meta analysis.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Hybrid probabilistic model integration for -omics data

Supervisor - Lorenz Wernisch (MRC BSU)

A goal of biomedical research is to understand cellular processes underlying regular cell development and factors that can disturb these processes leading to disease. Increasingly comprehensive experimental data sets are available that aim at providing a multi-dimensional view on such processes from different angles such as genetics, genomics, epigenomics, transcriptomics, or metabolomics (for example the Blueprint project, http://www.blueprint-epigenome.eu).

Traditionally analyses of such multi-dimensional data are based on a series of individual analyses for each data level: genetic association studies to identify genetic variants, which are then fed into an analysis of the genommic and epigenomic structure, which in turn are fed into further downstream analysis of gene regulation and protein activities. However, information is potentially lost at each stage of such multi-step analysis since there is often little opportunity for feedback from later stages of the analysis to earlier ones. A probabilistic model comprising all different stages at once, which would allow information to flow freely between components, would therefore be desirable.

Traditional Bayesian approaches to a comprehensive model which are based on the (hierarchical) combination of standard distributions, however, struggle with the size, complexity and heterogeneity of the data. A potential solution exists in the combination of a traditional modelling approach with modelling ideas from Bayesian nonparametrics or machine learning. For example, some components of the model might be best modelled by nonparameteric density estimation obtained via kernel methods or deep neural networks, while other components might be understood well enough to be modelled by traditional probabilistic methods using standard distributions and modelling techniques. Inference for such hybrid models poses an extra challenge since traditional inference methods, such as Monte-Carlo simulation, need to be combined with training methods from machine learning.

This is a multi-disciplinary project which requires a deep interest in Bayesian as well as machine learning methods and the willingness to understand the biological questions and structure of the experimental data driving the modelling. Some familiarity with Bayesian modelling and statistical computing is required.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Experimental design for inference of gene networks from single cell data

Supervisors - John Reid (MRC BSU) and Steven Hill (MRC BSU)

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However only limited progress has been made reverse engineering these networks using the data available from modern high-throughput biological experiments.

The space of all possible undirected network structures grows exponentially in the number of genes and thus network inference is underdetermined for networks of any reasonable size. However, the network inference problem is wellposed in abstraction and this makes it an attractive problem to study. This low barrier to entry together with its biological importance means that network inference has been extensively studied over the last two decades. Many inference methods have been developed that work with various types of experimental
designs [1].

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockdown experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. However they are expensive and time-consuming to perform and typically biologists can only perform a handful of perturbations. Usually the perturbed genes are chosen by the experimenter in an ad hoc fashion. This project will develop methods for experimental design (that is how to choose which gene(s) to perturb) in order to maximise the value of information from each experiment. Some work exists on experimental design in this context [2–7] but in general this field has not been studied nearly as extensively as the network inference problem.

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. This project will focus on experimental design for single cell experiments.

Most network inference techniques provide point estimates of the network structure. This is a reasonable strategy given the difficulty of exploring the entire space of networks. However to reliably gauge the likely amount of information gained from any particular experimental perturbation, methods to estimate correlations and uncertainty in the posterior will need to be developed.

Given these correlations and uncertainties, methods to choose which genes to perturb will be explored. It is anticipated that the methods developed for the will be Bayesian methods as they naturally quantify uncertainty.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Model-free network inference from single cell data

Supervisor - John Reid (MRC BSU)

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However, only limited progress has been made reverse engineering these networks using data available from modern high-throughput biological experiments [1].

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. It is anticipated that single cell data will greatly aid the reconstruction of gene regulatory networks. To date only a few inference methods have been developed specifically for single cell data [2, 3].

Classical network inference is posed as an network edge prediction task given a gene-by-sample data matrix of gene expression levels. In this formulation when the true network is known the predictions can be validated using precision and recall or other similar statistics [4]. This project will take an alternate approach and focus on model-free approaches to modelling such data. By model-free we mean we will use methods that do not explicitly represent the structure and parameters of the network.

Model-free approaches are the state-of-the-art for modelling certain physical systems [5]. They are able to accurately learn the dynamics of complicated systems with no prior knowledge of the physical relationships between the variables [6]. This project will investigate how to translate their success learning the dynamics of physical systems to the problem of learning the dynamics of gene expression. One model-free approach for regulatory network inference could be to learn the dynamics of the system using a deep neural network [7] or a Bayesian nonparametric model such as a Gaussian process dynamical model [8]. In this approach single cell data from a time series experiment would be placed along a pseudotime dimension [9]. The dynamics of gene expression relative to this pseudotime would be learnt by the model.

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockout experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. We will be interested in developing model-free methods that can recapitulate the behaviour of a system under perturbations. Only in this case will we be able to interrogate the model and confidently infer which regulatory relationships are present.

More info and to apply: http://www.mrc-bsu.cam.ac.uk/training/phd/phd-opportunities/

Alison Quenault

Communications Officer

MRC Biostatistics Unit

Cambridge Institute of Public Health

Forvie Site

Robinson Way

Cambridge Biomedical Campus

Cambridge CB2 0SR

Tel: +44-(0)1223-768263

Email: [log in to unmask]

Website: www.mrc-bsu.cam.ac.uk