Dear All,
Please find attached the latest GridPP Project Management Board
F2F Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE GridPP Project Leader
Rm 478, Kelvin Building Telephone: +44-141-330 5899
Dept of Physics and Astronomy Telefax: +44-141-330 5881
University of Glasgow EMail: [log in to unmask]
G12 8QQ, UK Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________
GridPP PMB Minutes 295 - 10th March 2008
========================================
Face-to-face meeting at GridPP20 - Dublin
Present: Tony Doyle, Sarah Pearce, Roger Jones, David Britton, David Kelsey,
Steve Lloyd, Robin Middleton, John Gordon, Jeremy Coles, Peter Clarke,
Glenn Patrick, Andrew Sansum, Neil Geddes, Dave Colling
(Suzanne Scott - Minutes)
Apologies: Stephen Burke, Tony Cass (TC present at PMB contd 11.03.08)
1. CCRC Status
===============
RJ noted concern about storage and server availability. DC noted that the
network to Fermilab was an issue but it did pass the testing generally.
For CMS the CCRC exercise was largely a success. It was noted that CSA'08
was coming up and between now and May CMS are doing network testing. RJ
noted for ATLAS that there had been nothing from this particular exercise
that was too worrying apart from the storage issue. He was more concerned
with the FDR. There was a discussion of the Tier-2s, physics groups, and
mapping.
2. Cuts to the GridPP3 Budget
==============================
DB reported that he had spoken to STFC and could confirm that GridPP were
in the medium-high category which corresponds to cuts in the order of
5-15%. He understood (subsequently confirmed) that GridPP was likely to
have to return 5% of the GridPP3 money which worked out at circa 1.2
million. TD noted that the project was over three years from April and
that this was a straightforward calculation. DB reported that a letter
would be received from STFC detailing the 5% - three points were noted in
relation to this:
1. the formal letter will require GridPP to have a plan of action as to
how the cut would be implemented;
2. the letter will provide a steer in relation to saving;
3. GridPP was being cut because the user base is being cut, i.e.: LHCb and
ALICE (BaBar was not mentioned).
The project itself is not being cut, however it would not be possible to
simply make savings pro rata, linked to the experiments. Trish Mullins
had noted that the STFC Committee regarded the Tier-1 as the cornerstone
of the project. It was noted that ALICE is likely to be cut back subject
to consultation, and LHCb reduced. DB noted that we are led by our
users, and we would need to re-map the hardware to reflect changes to this
community.
DB advised that he was looking at ways to save this money through delayed
starts to some unfilled posts and a re-profiling of the hardware spend. It
would take some time to establish all the details but it was hoped that
the premature termination of posts could be avoided.
DB advised that if things change then the plan changes - we are not
setting the UK physics agenda/strategy, rather, we need to reflect the
community, therefore the model will be adjusted accordingly. DB noted
that Mike Green had requested that all STFC letters to PIs will be made
public. DB noted that he had prepared statements for PMB approval which
would be attached to the STFC letter - these were presented on screen.
It was understood that these statements put the new cuts within the prior
context of previous cuts imposed. The PMB agreed the strategy, but noted
a need to amend the wording of the statement. DB would do a new draft and
circulate this for comments; work on the finances was ongoing.
3. Tier-1 Hardware Spend
=========================
DB noted that the next hardware spend may need to be slightly delayed.
AS reported that this had not yet commenced, a pre-qualification
questionnaire was required. DB advised that we would need to wait until
the consultation period was over so that we could see the whole picture.
AS advised that there was nothing critical at present but it would be a
good idea to start the PQQ and assess vendors. AS noted that the
procurement schedule should be planned to coincide with the new building
schedule. AS would think-through the implications in the meantime. DB
noted that there would also be consequences on tape planning (from the
cuts to our user-base) and there would need to be a realistic revision of
the plan once the final figures were known.
4. GridPP3 Project Map
=======================
SP presented the Project Map, noting that it was not yet finalised but
most metrics and milestones were now included. SP noted little area input
for security, network, data and storage, and middleware support, and she
asked for inputs for these sections.
Network
-------
PC noted that something meaningful for the network area was difficult, as
the Tier-2s were connected via SuperJANET and GridPP did not monitor that.
DB asked about network downtime. PC noted that for the Tier-2 this was a
waste of effort. TD commented that whether a site was up or down was
worth monitoring, the backbone network was not so important. RJ countered
that moving files was important and there was a need to identify network
issues. TD suggested that an automated system was needed. DB noted that
we should take the users' view. PC suggested the measure should be
transfers to the Tier-2 sites that fail, regardless of cause. DC advised
that there existed a transfer quality page that measures transfers that
succeed. AS noted that the SAM tests only measure availability of a site.
DK agreed that we need to track the network to ensure that it meets users'
needs. DB asked what should go red on the project map - it was a
diagnostic issue, therefore the primary notification was not the project
map. TD noted that the SAM tests show user jobs failing. PC asked if
there could be a 'data' user box if data management fails due to the
network. SL asked what was possible to measure via the Monboxes? AS
advised that the Monboxes measure end-to-end achievable connection,
therefore connectivity or round-trip times/throughput. PC asked if the
mesh was defined? AS advised that UKERNA do that. SL commented that a
lack of connectivity needs to show up though, rather than one incident
that causes downtime. PC noted that a useful measure to do with data
transfer would be the persistent limitation of a Tier-2 site in relation
to data transfer - could Greig Cowan deal with that? TD noted that most
of this is done within the experiments now, they test the SRM endpoints as
well as data and storage. DB noted that performance was the experiments'
view - the Tier-2 review should be measured here and the Tier-2 plan
adjusted to meet requirements - one metric could be to revisit the plans
and carry out a review. It was agreed to insert 'network plans' to ensure
they were up-to-date at each site - this would ensure 'suitable network
planning provision'. SP to see the wiki sent by TD. PC suggested that to
be proactive there should be a formal look at this next year involving PC,
RJ, DK and RM - PC to organise.
Data & Storage
--------------
TD is to address this as Technical Director and get back to SP with inputs.
Middleware Support
------------------
SP was awaiting information on R-GMA from Steve Fisher.
EGEE
----
The EGEE box was currently blank, RM to get back to SP with inputs.
SP noted that she was awaiting a VOMS report from AS and a Grid
Vulnerability report from DK - these were almost in the nature of two
Quarterly Reports. AS and DK to provide inputs.
Experiment Metrics
------------------
SP presented figures and metrics for ATLAS, LHCb and CMS. DB asked if it
were possible to streamline? RJ noted that each item does get asked about
and it is a lot of work. RJ suggested monitoring the high-level work:
ATLAS and CMS should look the same.
Other experiments
-----------------
It was noted that for Minos, BaBar, ZEUS and PhenoGrid, the numbers were
not based on data - the percentage for PhenoGrid should be negotiated.
GP noted that there was no update on the portal and documentation posts.
NG asked how we could highlight the other experiments? TD suggested that
they be included as a generic category. DB suggested that this should be
about our users rather than the experiments - was it possible to measure
the performance of users of GridPP? DB noted that we have to be realistic
about 'other users' for whom we haven't designed this and where the
metrics are less challenging. GP noted that the metric needs to be
meaningful for the portal and development posts, 50% FTE needs to be
measured and the experiments should be asked. SP asked about the number
of user groups using the portal? Should the figure be set at 1, or 2?
SuperNemo and MICE are examples. It was agreed that the figure should be
set for the number of user groups using the portal. GP advised that some
sort of efficiency was to be measured. DC noted that small experiments
have bursts of data which are not consistent. DB suggested that use of
CPU time, tracking, would not be a waste of effort.
GOC Post
--------
GP was unsure of detail: flexible and short-term was preferred, and the
turnover might be useful. TD noted that a questionnaire would be helpful
- the post-holder to name recipients. DB noted that a metric, listing the
number of problems dealt with plus a questionnaire would be more useful -
the metric could be an annual questionnaire.
Deployment
----------
It was agreed to change >75% of GridPP posts appointed, to 90% The
hardware should be as per pledge, whether it is met or not.
F2F meetings - there should be three (with a cut in travel) per year.
CB meetings - there should be at least one per year.
UB meetings - there should be at least three.
DB meetings - ditto.
5. Tier-1 Progress from the Tier-1 Review
==========================================
AS had sent a summary email to the PMB.
Castor
------
AS reported that CASTOR issues still need to be resolved but the situation
was much better than previously - nothing was in 'crisis' now and they
were using 2.1.4 release with the position improving month-on-month. It
was noted that the OC had expressed concern - the Tier-1 Review had
highlighted things but AS was reasonably happy with the current situation.
Out-of-hours response
---------------------
AS reported that the catalogued fault conditions and testing were close to
completion and the 'on-call' system operating - these would be completed
this month. New posts were potentially three months away. DB asked DK to
include an estimate of post savings etc. NG recommended pushing ahead
with recruitment.
User Support
------------
AS reported that staff were still 'stuck' at the recruitment stage; the
CCRC'08/01 experience was being reviewed and experiment requirements were
being collated - it was hoped that this would be completed by 23rd March.
Work on the Tier-1 simulation was ongoing; as was benchmarking of
hardware.
Management
----------
AS noted that this was now formalised, the Minutes of the recent meeting
were available on the web - the meetings would be timed closer to
deployment needs, and networking links were an issue.
Service Delivery
----------------
AS reported on a bottom-up plan that was building the component parts of
service plus milestones & metrics - there was also a top-level plan
looking at drivers, including data-taking, MoU, CCRC etc - these plans
were three-quarters complete. DB explained the WLCG CCRC'08 official
services 'GridMap'.
Disaster Planning
-----------------
AS noted that this issue was on the horizon for the OC for re-examination.
AS noted no further progress since the last meeting, but a substantial
amount had already been done.
Networking
----------
RM had dealt with this issue at the Management Meeting and it was on
various committees as an issue, although it was stalled at RAL due to
changes in the IT re-encryption and a review of bandwidths and rates
currently being done.
Dashboard
---------
It was noted that an LCG plan exists.
6. Other Experiment Issues
=============================
It was noted that these include Tier-2 requirements of UKQCD for GridPP3.
GP advised that events had overtaken things somewhat - would UKQCD use
GridPP3 resources in 2009? GP asked what was on offer from GridPP for
future requirements and were any firm answers possible? GP reported that
UKQCD had resources to 2008, but if their bid to the HPC call fails, what
is the backup plan? DB advised that he suspected there was some
likelyhood that the bid would fail. PC said this was premature and it was
not yet clear that the bid would definitely fail. DB noted that GridPP
had supported UKQCD in the past, and they had been responsive, but we
cannot commit resources to them due to the needs of the LHC experiments,
but the door should be kept open as far as possible regarding tape media
etc. DB noted that we may be able to help if there is no impact on LHC
experiments, but we can't commit to anything until we know what resources
are available to us. DB noted that the GridPP3 bid contained minimal
resources for UKQCD (who updated their requirements after the bid had been
submitted). PC commented that if STFC give us resources to support UKQCD
then that is fine, but it should be made clear that GridPP are not
responsible if funding decisions go against UKQCD. In conclusion, GridPP
cannot commit to providing resources to UKQCD but would try to accommodate
them where and when possible without impacting the LHC experiments.
7. Network Contingency Plans
=============================
PC had circulated a paper on network contingency, the paper had been done
as a PMB document. It recommended that GridPP should not consider spending
100k to put in a fully diverse resilient optical link. DB asked what
would happen if the link went down? PC noted that the total annual cost
of of a 1GB alternative link would be 18,700. DB suggested that we would
only need a backup link for longer outages and that 'disaster planning'
was for large disasters. In that context, a 1GB link would be too little
to solve the problem. PC advised that the majority of the cost was for
TVN. PC suggested that in the current financial situation it would not be
wise to devote 20k to a 1GB link but that this should be kept under
review. This was agreed. It was agreed that PC would request
clarification from Robin Tasker if the cost quoted was per 1GB link or
whether it was only incrementally more expensive to add to this bandwidth.
8. AOB
======
NG brought-up the issue of the LCG CB Chair. TD and RJ left the room.
NG reported that nine people had been proposed, two had declined. NG
asked for views on the candidates proposed - it was noted that several
candidates were not known to the PMB and this was probably indicative.
JG asked if another UK person was advisable? DB suggested that it might
not be a good idea to have another UK person in terms of politics. There
was a discussion on the two UK candidates proposed. In conclusion, it was
felt that a non-UK candidate was preferrable but that either of the two UK
candidates could be supported.
GridPP PMB Minutes 295 - 11th March 2008
========================================
Follow-up face-to-face meeting at GridPP20 - Dublin
9. NGI Planning
===============
RM reported that at the last PMB F2F in Glasgow, the workshop had just
taken place, with a potential EGI shopping list. This week the Rome
workshop was taking place which was to discuss decisions of what was in
and out of scope, with a date for blueprint in June '08. The question
was, where GridPP fits in, and milestones need to be included. It was
noted that Trish Mullins was not present for this discussion, but there
were two main issues:
1. NGI milestones & metrics
2. EGI organisation
RM noted that there would be no new information until after the Rome
meeting - Andy Richards, Dave Ferguson and John Gordon were attending.
NG noted that EGI has to deliver a production service. TD suggested that
the ultimate driver for GridPP was through LCG and it has to find its
place within EGI. PC commented that NGS EGI will be funded via JISC, and
that we need to support NGS. NG asked whether GridPP needed an NGI to
support universities, in order to support LHC in future? DB advised that
future money might go direct to an NGI if GridPP transfers reliance to
them. DC noted that if NGS/NGI are using University clusters then GridPP
use them too, it would not all be de-coupled. It was noted that these
were two separate operating structures as an ideal, but it was an unclear
route to get there. DK noted that GridPP would not be making decisions on
our own - it would be in conjunction with CERN etc. DB pointed out that
it was important for GridPP not to be in a position where we don't have
any influence. It was agreed that it was difficult to come to any
decision at present, until more information was known. NG commented that
it was important to have a document saying what our position is however.
TD noted that JG should be the main person to produce this document due to
his MB perspective, and SP should assist with metrics. He should liaise
with RM re EGEE inputs. This was agreed.
10. GridPP Travel spend
=======================
RM reported on travel costs. Taking Ambleside as an example, a 3-day
meeting costs around 13k - the total is around 25-30k per annum. For WLCG
meetings the cost is c6-10k. There are, additionally, workshops (dCache,
SRM etc) and experiment software weeks, also experiment training courses -
these are around 5k each. For EGEE conferences, there is a 50/50 funding
scenario and we can claim from the EU - the cost is generally 10k each.
Regarding CHEP for the next financial year there would be a ceiling of
25-30k. UK eScience All Hands is around 5k. RM advised that if all these
are added up they come to 100k, plus there are additional ad hoc meetings:
GDB, grid security etc. From EGEE, a 60k cost is typically reimbursed by
around 30k. If GridPP get 170k ater the cuts, the experiments may look
for support. TD advised that this would be below-the-line, and looked on
as single trips. RM advised of other expenses, dissemination, training by
Steve Fisher, costs could therefore rise to around 170k.
TD advised that regarding individual trips, some were necessary but some
were shared with the experiments and we would need to have a revised
discussion. RM noted a 'safe limit' of 120k for travel + 50k for
everything else - eg: a limit of 10 people at CHEP. SP noted a
dissemination budget requirment of around 10k per year. It was agreed
that DB, RM and SP would target categories for the travel budget for the
coming year. Targets were required for how much GridPP were spending and
in what categories of expenditure. Funding decisions in relation to
dissemination would be made per issue - with a sub-line that says 10k.
RM to provide categories and breakdown of travel + additionals to enable
monitoring and decision-making. PC noted that training can be expensive
and training events should have the eye of experiments' PIs.
11. AOB
========
SP advised that she had been asked to produce information for Malcolm
Atkinson relating to:
- project application for Institutes
- industrial take-up and follow-on (GridSite)
- number of downloads of opensource software developed by project
RM noted that Steve Fisher's R-GMA would be helpful and that SP should
also contact GridSite for numbers. RM suggested going to each technical
area and asking.
SP also advised that there was a specific category of prizes/awards, but
that GridPP does not gather this kind of information at present.
ACTIONS AS AT 10.03.08
======================
295.1 DB to re-draft the attachment to the GridPP letter to STFC (in
response to the latest cuts imposed) and recirculate to PMB for approval.
295.2 Re the Project Map, SP to insert 'network plans' to ensure they were
up-to-date at each site - this would ensure 'suitable network planning
provision'. [SP to see the wiki sent by TD].
295.3 It was agreed that there should be a formal look at Network Planning
for the Project Map next year involving PC, RJ, DK and RM - PC to
organise.
295.4 TD (as Technical Director) to address the issue of Data & Storage on
the Project Map and get back to SP with inputs.
295.5 RM to get back to SP with inputs regarding the EGEE box on the
Project Map.
295.6 SP noted that she was awaiting a VOMS report from AS and a Grid
Vulnerability report from DK - these were almost in the nature of two
Quarterly Reports. AS and DK to provide appropriate inputs.
295.7 Re network contingency, PC to request clarification from Robin
Tasker if the cost quoted was for 1Gig only.
295.8 Re NGI planning, JG to produce a document/statement on the GridPP
position (due to his MB perspective), and SP to assist with metrics. JG
to liaise with RM re EGEE inputs.
295.9 DB, RM and SP to target categories for the travel budget for the
coming year. Targets are required for how much GridPP might spend and in
what categories of expenditure.
295.10 RM to provide categories and breakdown of travel + additionals to
enable monitoring and decision-making.
The next PMB would take place on Thursday 20th March at 1:00 pm.
|