UNIVERSITY OF GLASGOW
STATISTICS SEMINAR PROGRAMME
Wednesday, 17 April, 3 pm
Optimal adaptation to the margin in nonparametric
classification
Alexander TSYBAKOV (University Paris VI)
Wednesday, 1 May, 3 pm
Johnson--Mehl tessellations: asymptotics and inference
Sung Nok CHIU (Hong Kong Baptist University)
Bayesian Statistics Day
Wednesday, 29 May
2-3 pm
Bayesian analysis of heterogeneity using mixtures and related models
Peter GREEN (University of Bristol)
3.45-4.45 pm
Spatial Bayesian variable selection with application
to human brain mapping
Ludwig FAHRMEIER (University of Munich)
Wednesday, 12 June
Statistical Modelling of extreme wind speeds in the context
of risk analysis for high speed trains
Helmut KUECHENHOFF (University of Munich)
Seminars take place in Room 203, Mathematics Building,
University of Glasgow
For further information please contact the seminar organiser:
Ilya Molchanov
University of Glasgow : e-mail: [log in to unmask]
Department of Statistics : Ph.: + 44 141 330 5141
Glasgow G12 8QW : Fax: + 44 141 330 4814
Scotland, U.K. : http://www.stats.gla.ac.uk/~ilya/
ABSTRACTS
OPTIMAL ADAPTATION TO THE MARGIN IN NONPARAMETRIC CLASSIFICATION
We consider the problem of classification in 2 classes based on the
i.i.d. data (X_1,Y_1),..., (X_n,Y_n), with Y_i in {0,1} and the
predictors X_i in R^d. Devroye, Gy\"orfi and Lugosi (1996) proved that
the Bayes regret of any classification rule cannot decrease faster
than root-n in the minimax sense, as soon as there is no restriction
on the class of joint distributions of (X,Y). The literature suggests
a large variety of classification rules that converge with the rate
root-n (up to a log-factor) or slower. These results take into account
only the complexity of the underlying "candidate" boundaries between
classes and neglect the structure of the margin (i.e. the behaviour of
the regression function v(x) = P(Y=1|X=x) near the boundary curve {x:
v(x) = 1/2}). The first result of this talk shows that the structure
of the margin plays a crucial role in the convergence of the Bayes
regret. In particular, fast rates, up to 1/n, can be attained for
"good" margins by using a simple empirical risk minimisation rule. The
second result suggests a classifier which is adaptive both to the
complexity of the boundary and to the margin. It is shown that this
classifier:
1) shares the properties of usual classifiers, i.e. attains the
optimal rates up to root-n when there is no restriction on the joint
distribution of (X,Y),
2) attains the "fast" optimal rates up to 1/n (to within a log-factor)
for "good" margins, whatever is the complexity of the boundary in a
given range.
Theoretically, the suggested method outperforms the common
penalization or boosting techniques: in fact, these techniques do not
adapt to the margin and cannot go faster than with the root-n rate
even for "good" margins (Koltchinskii (2001)). Practically, the
suggested method is explicit but computationally difficult. It is
based on multiple pre-testing schemes.
JOHNSON--MEHL TESSELLATIONS: ASYMPTOTICS AND INFERENCE
Consider a set of distinct, isolated points, called seeds, in a
continuous space. Seeds will be stimulated after random times. A
seed, once stimulated, immediately tries to germinate and at the
same time to prohibit other seeds from germination by generating a
spherical inhibited region the radius of which grows at a positive
speed. A seed stimulated at time t fails to germinate if and only
if its location has been inhibited on or before t. The set of
locations first inhibited by the growth of the inhibited region
originated from x is called the cell of x. The space will be
partitioned into cells and this space-filling structure is called
a Johnson--Mehl tessellation.
In this talk we consider the distribution of the time until a
large cube is totally inhibited. It has an extreme value
distribution. In particular, for seeds located only on a line, we
explain how to obtain the exact distribution of this time by
transforming the original process to a Markov process. Moreover,
we discuss the number of germinations. A central limit theorem for
this number is shown for the case that seed locations and
stimulation times form a Poisson process. The result is then
extended to the case that the seed locations are m-dependent.
The second part of the talk is devoted to the estimation of the
growth speed of inhibited regions and the intensity measure of the
Poisson process. The maximum likelihood estimation for the speed,
a nonparametric estimation for the intensity measure and for its
density, and the maximum likelihood estimation for the parameters
of the intensity with known analytical form are proposed and
applied to real neurobiological data.
joint work with I. Molchanov and M.P. Quine
BAYESIAN ANALYSIS OF HETEROGENEITY USING MIXTURES AND RELATED MODELS
Peter J Green, University of Bristol, UK
The problem of finite mixture analysis dates back many years,
the frequentist methodology presents many difficulties,
but only recently has a really satisfactory Bayesian
analysis become available, enabled, of course, by the use of
novel MCMC methods. This talk will focus mainly on adaptations of
the basic mixture model to various situations of more structured
data, where the interest is in identifying and quantifying
heterogeneity, and will include an approach to Bayesian ANOVA
for factorial experiments, and a new analysis of geographical
epidemiology data.
SPATIAL BAYESIAN VARIABLE SELECTION WITH APPLICATION TO HUMAN BRAIN
MAPPING
A basic approach for human brain mapping using functional magnetic
resonance imaging (fMRI) is as follows: At each pixel i, the
activation effect is modelled through a pixelwise regression model
with a predictor incorporating baseline effects and the effect of an
external stimulus on the fMRI signal time series measured at this
pixelwise. An important issue is to take into account spatial
correlation with neighbouring pixels. We consider this problem within
a more general context of spatial Bayesian variable selection, where
binary indicators define non-zero regression coefficients of usual
linear models for each location i. Spatial variable selection is then
achieved through Ising priors for these indicator variables and
posterior analysis via MCMC. This approach is illustrated by
application to a visual fMRI experiment, and the results are compared
to previous approaches where spatial smoothness priors are directly
introduced for the activation intensities.
STATISTICAL MODELLING OF EXTREME WIND SPEEDS IN THE CONTEXT OF RISK
ANALYSIS FOR HIGH SPEED TRAINS
The risk of derailment due to extreme wind has become an important
issue in connection with modern high speed trains. There are
fortunately no data on derailment accidents caused by wind and the
related technical aspects are difficult. The work is a consulting
project with the German Rail (DB) for the risk assessment of new high
speed tracks in Germany. The results are essential for planning
wind-protection measures and possible speed reduction at critical
points. The presentation has two parts.
In the first part we consider a directional model for the extreme wind
speeds proposed by Coles and Walshaw (1994). This model, based on the
largest order statistics, is a generalisation of the classical extreme
value model and its parameters vary with direction according to
harmonic terms. A procedure in R was developed including profile
likelihood confidence intervals for the extreme quantiles. The model
was estimated for several relevant German weather stations alongside
the track.
Secondly, the risk analysis is discussed in detail. The risk is
quantified using the probability that the directional critical wind
speed is exceeded when the train passes. The critical wind speed is
determined by technical considerations: It depends on the speed of the
train and the curvature of the rail. The results of the extreme value
analysis are an important input for the risk assessment.
|