ROYAL STATISTICAL SOCIETY - GENERAL APPLICATIONS SECTION and
INTERNATIONAL BIOMETRICS SOCIETY, BRITISH REGION
joint half-day meeting on
BIOINFORMATICS
Tuesday 22 February 2000
Royal Statistical Society, 12 Errol Street, London EC1Y 8LX, ph 0171 638 8998
Tubes: Barbican, Moorgate, Old Street.
Bioinformatics is a rapidly developing field at the interface of
molecular biology, computer science, and statistics. Several research
councils and other agencies have identified it as a priority area for
research and training, but while there is widespread agreeement about
its importance, there seems to be less agreement over what exactly
"bioinformatics" is.
One definition is that bioinformatics is about "the acquisition,
archiving, analysis, and interpretation of molecular biology
information". The important role of statistics should be evident from
this definition, but is not always appreciated by non-statisticians.
This meeting will introduce statisticians to some problems and ongoing
research in bioinformatics, and aims to encourage them to play a
bigger role in the development of this important new field. There will
be a small number of posters on display during the tea break
illustrating current statistics-related bioinformatcs research.
PROGRAMME
2:00 - 2:45 Ewan Birney (EMBL-EBI, Hinxton)
"The Computational Annotation of the Human Genome"
2:45 - 3:30 Eddie Holmes (Zoology, Oxford)
"Reconstructing demographic histories from gene sequences"
3:30 - 4:00 Tea/Posters
4:00 - 4:25 Martyn Byng (Applied Statistics, Reading)
"Detecting gene regulatory sequences"
4:25 - 5:10 Nick Goldman (Genetics, Cambridge)
"Inference of pressures of natural selection on the evolution of gene
sequences"
5:10 Close
For further details of the meeting please contact:
David Balding Ph 0118 9318021; [log in to unmask]
Mike Denham Ph 0118 9318914; [log in to unmask]
ABSTRACTS:
The Computational Annotation of the Human Genome
------------------------------------------------
Ewan Birney
http://www.sanger.ac.uk/HGP
http://ensembl.ebi.ac.uk/
http://www.sanger.ac.uk/Software/Wise2
The human genome project is due to provide 90% of the human data by
Spring 2000. This ambitious plan will provide a data resource
applicable to nearly every part of human life science research. At the
Hinxton campus, we are developing a stable software system to provide
value added information ontop of the DNA sequence.
Statistical analysis of DNA sequence is a core aspect of finding
interesting features in DNA sequence. I will present an overview of
how hidden Markov models are used in the field, and one example of a
specific hidden Markov model which can find genes in the DNA sequence
at high accuracy.
Reconstructing Demographic Histories from Gene Sequences
--------------------------------------------------------
Eddie Holmes
Department of Zoology, University of Oxford,
South Parks Road, Oxford OX1 3PS. UK.
Reconstructing the rates at which populations grow or decline is of
fundamental importance for both evolutionary biology and infectious
disease epidemiology. In the latter this information is used to
predict how many individuals are likely to be infected by a pathogen
during an epidemic, as well as the extent of vaccine coverage needed
to control its spread. Advances in population genetic theory now make
it possible to infer rates of population growth in pathogens directly
from an analysis of gene sequence data, by analysing the distribution
of coalescent events on gene genealogies taken from a sample of
sequences representing a single point in time, rather than relying on
longitudinal serological information. Here I shall present an
overview of these methods and illustrate their use in reconstructing
the population dynamics of two important human pathogens the human
immunodeficiency virus (HIV) and hepatitis C virus (HCV). The results
of these analyses reveal that different genotypes of these viruses
have spread at very different rates, perhaps because they are
associated with different routes of transmission.
Detecting gene regulatory sequences
-----------------------------------
Martyn Byng
Department of Applied Statistics
University of Reading
http://www.rdg.ac.uk/~sns98gm/sghome.html
A key goal of current genomics research is to identify regions in DNA
sequences which are involved in regulating the level of activity of a
protein-encoding gene. Such regulatory regions often consist of
clusters of binding sites for proteins which play a role in gene
expression. Regulatory regions can be classified into two types:
promoters and enhancers. The latter are harder to detect as they can
be remote from the gene that they regulate, are relatively short (a
few hundred base pairs), and have no universal identifying
characteristics. We implement a statistical algorithm to detect
potential regulatory regions in long DNA sequences, by looking for
local excesses of short motifs which may correspond to protein binding
sites. The algorithm flexibly adjusts weights for different motifs in
different sequences, to allow for redundancy in the catalogue of
candidate binding motifs.
Inference of Pressures of Natural Selection on the Evolution of Gene Sequences
------------------------------------------------------------------------------
Nick Goldman
Department of Genetics, University of Cambridge,
Downing Street, Cambridge CB2 3EH, UK.
http://ng-dec1.gen.cam.ac.uk
Ever since Charles Darwin, natural selection has been understood to be
one of the main forces acting to affect organisms' evolution. Since
the time of Crick and Watson, with the understanding of the encoding
of information in genetic sequences, and particularly now that
large-scale genome sequencing projects are commonplace, it has been
natural to ask how we can relate natural selection and gene sequences.
Sequence analysis methodology is only now beginning to be able to
investigate hypotheses regarding natural selection, via the
evolutionary comparison of related sequences and the fitting of models
of sequence evolution that utilise our knowledge of the genetic code.
I will describe some new methods for these analyses, and illustrate
their application to sequences including genes from influenza and
HIV-1 viruses. Results indicate that instances of positive selection,
i.e. the favouring of evolutionary changes that alter the encoded
amino acid sequence of a gene, may be more widespread than was
previously thought.
-----------------------------------------------------------------------
Department of Applied Statistics ph 0118 931 8021
University of Reading fx 0118 975 3169
PO Box 240 [log in to unmask]
Reading RG6 6FN www.rdg.ac.uk/~snsbalng/
-----------------------------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|