(Apologies for cross posting)
Hi Everyone,
A workshop on Calibration and Validation of Computer Models is being held as a satellite event to ISBA08, on the 27th and 28th of July at
Macquarie University, Sydney. Abstracts for each of the workshop speakers are available below and on the "Complex Computer Models" link in the ISBA
2008 webpage: http://www.isba2008.sci.qut.edu.au/workshops2008.shtml#sydney
Please register before Friday 18th July (this Friday!).
Titles and abstracts are provided below:
Susie Bayarri: Assessing the risk of catastrophic events by combining statistical and computer models
Abstract:
Risk assessment of rare natural hazards  such as large volcanic pyroclastic flows  is addressed. Assessment is approached through a combination of computer modeling, statistical modeling, and extremeevent probability computation. A computer model of the natural hazard is utilized to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available data is needed to determine the initializing distribution for exercise of the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive. Solution instead requires a combination of adaptive design of computer model approximations (emulators) and rare event simulation. The techniques that are developed for risk assessment are illustrated on a testbed example involving volcanic flow.
Tiangang Cui: Statistical inversion and Markov Chain Monte Carlo methods in geothermal model calibration
Colin Fox: TBA
James Gattiker: On design for parameter inference in emulators
Abstract:
In the study of computer models, statistical approximations of simulation responses over a parameter space allow analytical approaches that are otherwise out of reach when simulations are expensive and data is sparse. Design for constructing accurate emulators has several open questions; we examine the interplay of the choice of correlation function, the inference of correlation function parameters, and the effect of predictive accuracy, on Gaussian process emulators. Our approach to design is to examine a hybrid method of pseudorandom sequences and optimal design based on optimizing Fisher Information for parameter inference. We present the results of simulation studies of parameter inference and design, and discuss the implications with respect to the problem of climate modeling.
Dave Higdon: Bayesian approaches for combining experimental data and computer models
Abstract:
By augmenting experiments with detailed simulationbased physical models one can greatly leverage the amount of information that even a limited set of experiments can provide. This tutorial describes Bayesian modeling and estimation techniques that may be used to combine these two sources of information. These methods include designing simulation campaigns, modeling simulation output, estimation  or calibration  of key simulation model parameters, and accounting for major sources of uncertainty. Various response surface models will be discussed, as will model formulations for combining the various sources of information.
Leanna House: Second order exchangeable emulators to assess initial condition uncertainty
Abstract:
We address the uncertainty of deterministic computer models that rely on both input parameters and initial conditions. We refer to such models as semideterministic. Purely deterministic computer models either do not have an initial condition or fix (without error bounds) the value for the initial condition so that the same output will result from one set of input parameter values, even when the model is implemented multiple times. Semideterministic models however, allow the condition to vary, and thus have the potential to produce more than one result per input. When multiple outcomes per input are present, current approaches rely primarily on summary statistics (e.g., mean and variance per input), and apply standard deterministic model uncertainty analysis approaches. However, inferences based solely on such statistics require implicitly strong assumptions which we are unwilling to make. Thus, we introduce the notion of latent computer model outcomes which correspond to the results of the semideterministic model when using the appropriate, but unknown, initial condition for the physical system of interest. The goal for this paper is to make inferences about the latent model given a sequence of realized semideterministic model evaluations. We consider the sequence elements to be second order exchangeable and use Bayes linear methods to assess the posterior expectation and variance of the latent model given the realised evaluations. We demonstrate our methods using semideterministic results from a galaxy formation model called Galform that relies on initial specifications of dark matter.
Jason Loeppky: Choosing the sample size of a computer experiment
Abstract:
In recent years virtual experiments implemented by a complex computer code or mathematical model are supplementing or even replacing physical experiments. The computer code mathematically describes the relationship between several input variables and one or more output variables. Often the computer models in question can be computationally demanding. Thus, direct evaluation of the code for optimization or validation is not possible in general. The general strategy is to build a statistical model to act a surrogate or an emulator of the true code. A long used rule of thumb for sample size takes a runs size that is 10 times the number of active dimensions. In this talk we investigate this rule of thumb for a variety of problems encountered in practice. In some cases we will show that increasing the sample size has a large effect on the prediction quality and in other cases increasing the sample size has little to no effect. These issues will be demonstrated using a model for polar ice caps and a model for the ligand activation of a GProtein in yeast.
Jeremy Oakley: Decisiontheoretic sensitivity analysis for complex computer models
Abstract:
We consider the use of computer models in decisionmaking, and use decisiontheoretic arguments to conduct a sensitivity analysis based on the expected value of perfect information for quantifying the 'importance' of each uncertain input parameter in a model. Standard Gaussian process emulators are used for efficient computation, and we address the problem of quantifying uncertainty in the sensitivity analysis results due to the use of an emulator with limited model runs.
Jonty Rougier: Bayes linear prediction with mulitple treatments: application to avalanche modeling
Abstract:
We have steadystate snow velocity profiles from ten largechute experiments, where each experiment takes place under different environmental conditions. Based on these we would like to predict the velocity profile across the full range of environmental conditions. This large number of observations and predictands poses challenges for fullyprobabilistic methods, but can be easily handled within a Bayes linear approach. We show how multiple treatments can be incorporated into the 'standard' modelbased inference, and illustrate a detailed elicitation for such an inference. This is joint work with Martin Kern at the Swiss Federal Institute for Snow and Avalanche Research, Davos.
Leonardo Soares Bastos: Diagnostics for Gaussian process emulators
Abstract:
Mathematical models, usually implemented in computer programs known as simulators, are widely used in all areas of science and technology to represent complex realworld phenomena. Simulators are often sufficiently complex that they take appreciable amounts of computer time or other resources to run. In this context, a methodology has been developed based on building a statistical representation of the simulator, known as an emulator. The principal approach to building emulators uses Gaussian processes. This work presents some diagnostics to validate and assess the adequacy of a Gaussian process emulator as surrogate for the simulator. These diagnostics are based on comparisons between simulator outputs and Gaussian process emulator outputs for some test data, known as validation data, defined by a sample of simulator runs not used to build the emulator. Our diagnostics take care to account for correlation between the validation data.
David van Dyk: Statistical Analysis of stellar evolution
Abstract
Color Magnitude Diagrams (CMDs) are plots that compare the magnitudes (luminosities) of stars in different wavelengths of light (colors). High nonlinear correlations among the mass, color and surface temperature of newly formed stars induce a long narrow curved point cloud in a CMD known as the main sequence. Aging stars form new CMD groups of red giants and white dwarfs. The physical processes that govern this evolution can be described with mathematical models and explored using complex computer models. These calculations are designed to predict the plotted magnitudes as a function of the parameters of scientific interest such as stellar age, mass, and metallicity. Here, we describe how we use the computer models as complex likelihood functions in a Bayesian analysis that requires sophisticated computing, corrects for contamination of the data by field stars, accounts for complications caused by binary stars, and aims to compare competing physicsbased computer models of stellar evolution.
Richard Wilkinson: Calibrating computer models with high dimensional output
Abstract:
I will consider the calibration of complex computer models which produce highly multivariate output, typically timeseries or spatiotemporal fields. Directly emulating these models is a computationally demanding task, and may not be possible for models with very high dimensionality. An alternative approach is to reduce the number of dimensions using a basis representation, for example the principal components, and emulate the computer model output using this reduced latent space representation. However, the data reduction will not typically produce an accurate representation of the field data, and so it is necessary to perform any calibration on the data space rather than the latent space so that reconstruction error is accounted for in the model parameters. I will illustrate these ideas on the UVic Earth system climate model.
The workshop also includes a panel discussion moderated by Jim Berger
If you have any questions about the workshop please email Petra Graham
[log in to unmask] or [log in to unmask]
Best wishes and hope to see you there,
Petra.
Dr Petra Graham
Department of Statistics
Division of Economic and Financial Studies
Macquarie University
Sydney NSW 2109
Australia
Ph: +61 2 9850 6138
Fax: +61 2 9850 7669
