Print

Print


Experimental Design and Big Data
Where: The University of Warwick, Zeeman building, Room MS.03
When: May 8 2015, 09:45 - 15:50
Full program & Travel directions:
www2.warwick.ac.uk/fac/sci/wdsi/events/yobd/design/
Organizer: David Rossell ([log in to unmask])
Registration: free, but pre-registration is mandatory

The workshop aims to discuss challenges and recent advances in strategies
to design experiments or data acquisition involving Big Data. Although the
information extracted from data is strongly determined by how they were
collected, careful design of Big Data collection seems to have been largely
overlooked. We will discuss approaches in a variety of fields, including
efforts to combine clinical trials with personalized medicine,
bioinformatics, signal acquisition, astronomy and online data collection.


PROGRAM

Matthias Seeger (Amazon, Switzerland)

Large scale variational Bayesian inference and sequential experimental
design for signal acquisition optimization

Abstract: I will give a brief introduction to sequential Bayesian
experimental design, in the sense of greedy maximization of information
gain. I will motivate the challenges this program places on approximate
Bayesian inference, if it is to be used for high-dimensional signal
acquisition optimization. I will outline a framework for variational
Bayesian inference in large sparse linear models, with which BED can be
implemented for such scenarios.


Tristan Henderson (University College of Saint Andrews, UK)

Reliable, reproducible and responsible data collection from online social
networks

Abstract: The use of online social networks (OSNs) such as Facebook and
Twitter for research has exploded in recent years, as researchers take
advantage of access to the hundreds of millions of users of these sites to
understand social dynamics, health, mobility, psychology and more. But
there are myriad challenges in collecting the appropriate data from OSNs
for an experiment.

In this talk we will discuss three of these challenges. First, we will look
at differences in passive collection of OSN data (e.g., crawling Facebook)
versus actively requesting information from OSN users. Secondly, we will
examine the state-of-the-art in reproducible OSN research; that is,
appropriate documentation of OSN experiments to enable replication and
indeed understanding of an experiment. Finally, we will look at responsible
data collection; in particular, collecting data in an ethical fashion that
respects the desires of the OSN users
themselves.


Jason McEwen (University College London, UK)

Optimising radio interferometric imaging with compressive sensing

Abstract: We are about to enter a new era of radio astronomy with new radio
interferometric telescopes under design and construction, such as the
Square Kilometre Array (SKA). While such telescopes would provide many
scientific opportunities, they will also present considerable modelling and
data processing challenges. Novel modelling and imaging techniques will be
required to overcome these challenges. The theory of compressive sensing is
a recent, revolutionary development in the field of information theory,
which goes beyond the standard Nyquist-Shannon sampling theorem by
exploiting the sparsity of natural images. Compressive sensing suggests a
powerful framework for solving linear inverse problems through sparse
regularisation, such as recovering images from the incomplete Fourier
measurements taken by radio interferometric telescopes. I will present
recent developments in compressive sensing techniques for radio
interferometric imaging, which have shown a great deal of promise.
Furthermore, by appealing to the theoretical foundations of compressive
sensing, I will discuss how telescope configurations can be optimised to
further enhance imaging fidelity via the spread spectrum effect that arises
in non-coplanar baseline and wide field-of-view settings.


Yuan Ji (University of Chicago, USA)

Subgroup-Based Adaptive (SUBA) Designs for Multi-Arm Biomarker Trials

Abstract: Targeted therapies based on biomarker profiling are becoming a
mainstream direction of cancer research and treatment. Depending on the
expression of specific prognostic biomarkers, targeted therapies assign
different cancer drugs to subgroups of patients even if they are diagnosed
with the same type of cancer by traditional means, such as tumor location.
For example, Herceptin is only indicated for the subgroup of patients with
HER2+ breast cancer, but not other types of breast cancer. However,
subgroups like HER2+ breast cancer with effective targeted therapies are
rare and most cancer drugs are still being applied to large patient
populations that include many patients who might not respond or benefit.
Also, the response to targeted agents in humans is usually unpredictable.
To address these issues, we propose SUBA, subgroup-based adaptive designs
that simultaneously search for prognostic subgroups and allocate patients
adaptively to the best subgroup-specific treatments throughout the course
of the trial. The main features of SUBA include the continuous
reclassification of patient subgroups based on a random partition model and
the adaptive allocation of patients to the best treatment arm based on
posterior predictive probabilities. We compare the SUBA design with three
alternative designs including equal randomization, outcome-adaptive
randomization and a design based on a probit regression. In simulation
studies we find that SUBA compares favorably against the alternatives.


Camille Stephan-Otto Attolini (IRB Barcelona, Spain)

A Bayesian framework for personalized design in alternative splicing
RNA-seq studies

Abstract: I will present a very useful (and nice) application of Bayesian
predictive simulation to the problem of sample size calculation in the
context of expression estimation from RNA sequencing data. New technologies
have made possible the scrutiny of gene expression at unprecedented levels
and the analysis of these data has generated a large number of models and
tools. Despite this, little effort has been done to address the problem of
sample size calculation in setups that involve thousands of euros in the
simpler experiment. We use a Bayesian probabilistic model to simulate reads
from pilot data in order to compute optimality measures for different
combination of experimental parameters. We focus on coverage calculation
minimising estimation error for the single sample problem, while we try to
optimise the number of differentially expressed isoforms in multi-samples
experiments. Our results show that optimal parameters depend on
characteristics such as species, tissue and conditions under study,
resulting in the necessity of personalized designs. We found that large
savings can arise from a well planned experiment, and suggest sequential
acquisition of data in order to optimise resources.



-- 
David Rossell, PhD
Assistant Professor
Dept. of Statistics, University of Warwick
+44 (0)2476523062

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.