Hello the UK,
Jeremy informs me that in order to get a VO discussed for GridPP
Approval, I should probably post some background stuff about it to the
list to enable informed discussion.
So, since the next meeting is a DTeam and Sites one, I thought it
would be most reasonable to post this to the TB-SUPPORT list.
The NeISS VO is the VO for the National e-Infrastructure for Social
Simulation project, which is basically exactly what it says on the
tin. They're a JISC funded project, succeeding earlier work by NCeSS.
They're currently in a process of working out how their infrastructure
would work, but they're rather interested in the model we have in WLCG
of catalog hierarchies built on LFC/SRM layering. We support them at
Glasgow, but we'd like them to get some use of the T1 LFC, which
requires a bit of discussion.
I attach several paragraphs of details about their aims and goals from
one of their Project Manager, June Finch:
In the longer term we are aiming for NeISS to support two types of
GENESIS Social Simulation Models developed by Andy Turner. Both of
these models focus on detailing the location and attributes of
simulated people. One of them steps on the resolution of seconds and
is like a transport or traffic simulation. The other steps on the
resolution of days and is like a model of demographic change
(incorporating birth, death and migration).
The models are implemented in Java requiring Java 1.6, but not
requiring any special installation. Several third party libraries are
used. These and the GENESIS source code are open source. Andy is
exploring the possibilities for parallelising the models using MPJ
Express.
We are currently focusing working on supporting a simple demographic
change model. For this, an initial simulation run inputs some
population, mortality and fertility statistics (less than 1MB in
size). It then generates (in collections) entities representing
individual people. On a daily time step this initial population is
then simulated a pre-specified number of steps. The initialised
population can be large and stored only partly in the fast access
memory of the computer. Data swapping is managed internally in an
attempt to minimise problems that can result from the throwing of
OutOfMemoryErrors.
At present Tom Doherty is developing user interface portlets and job
staging code to run the simulation models at NeSC Glasgow. There are
essentially three types of data we have identified:
1. Metadata that allow for a simulation result to be produced.
2. Result metadata that are passed back to the user and which can be
analysed by the user to assess the simulation results
3. Simulated populations and checkpoint metadata
Type 1 and 2 data are small especially in comparison with Type 3 data
which for a city of 1 million people run over several generations
could result in TerraBytes of Type 3 data. The Type 3 data is too big
to be passed back to the portal and to the user and we want to store
that somewhere near where the computation has been done. For this
ScotGrid have offered us use of 2TB worth of data. We envisage that
during NeISS simulations may run on different resources and that Type
3 data might be pulled by simulations run at NGS core sites. Given
this, there is a need for an archive of simulation results and we have
been advised that an LFC offers a appropriate solution for allowing us
access. There are two main ways we are considering doing IO. One is
from within the Java programs, the other is before and after a Java
programs (to move the data from the LFC for local input, and to push
the result to LFC as output). So rather than having one IO mechanism
to use the 2TB of data now and another later when we have an archive
(LFC), we are pushing for getting an LFC archive set up.
We are hoping to run demographic change models for the UK, but
estimates suggest that this would require over 2TB of data for a run
of ten years which is what we require to compare between the UK
Population Census outputs for consecutive censuses. The larger the
number of time steps that are simulated, the larger the data storage
requirements are. For this model, there are no call outs for data to
be fetched in from the web.
The transport or traffic simulation model is more complicated. In its
current state it does try to pull data from web servers as it needs
(e.g. Open Street Map data is pulled as it is needed). In theory, all
the data for this model could be made available locally prior to
running the model.
At present there are no data confidentiality issues with the data we
are using. It may be that down the line, we may use more restricted
population data for the simulations.
In terms of the NeISS project, the support for Andy's GENESIS Social
Simulation Models is only one component, but it is an interesting and
testing one for the e-Infrastructure that is evolving due to the
potential computational demands of those models.
|