Dear All,
Please find attached the latest GridPP Project Management Board
Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Dave
________________________________________________________________________
Prof. David Britton GridPP Project Leader
Rm 480, Kelvin Building Telephone: TBD
Dept of Physics and Astronomy Telefax: +44-141-330 5881
University of Glasgow EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________
GridPP PMB Minutes 298 - 7th April 2008
========================================
Present: Tony Doyle, Sarah Pearce, Roger Jones, David Britton, Tony Cass,
Robin Middleton, John Gordon, Pete Clarke, Glenn Patrick, Andrew Sansum,
Suzanne Scott (Minutes)
Apologies: David Kelsey, Steve Lloyd, Jeremy Coles, Neil Geddes, Dave Colling
5. Tier-2 Pledge
=================
JG had circulated the latest of MOU pledge numbers compiled by CERN and
noted that London and NorthGrid showed a large reduction in their 2008 pledges
compared with 2007. JG noted that we seem to have moved from a position
where the sites told us what they will provide, to a position where GridPP says
what it wants to buy from them. JG asked whether sites have the freedom to
pledge additional resources to WLCG? DB asked why the figures had decreased
so much? It was noted that there was a large BaBar component at NorthGrid
and London which does not count towards the pledges. JG asked where we had
got the 2008 figures from? DB advised that the model was based on the
hardware grant and the minimum amount of resources expected. DB noted that
the 2008 numbers are the ones recalculated following funding confirmation -
they are correct at present but will need to be revised. JG noted that the
experiments have input and requirements, and that Janet Seed should be briefed
on the change. DB advised that the final GridPP3 scenario reduction of 70% and
the experiments choice to only declare 75-80% of Tier-2 resources, had both
had an impact. DB would brief Janet Seed. It was further agreed that DB and
JG would iterate regarding the understanding of the numbers involved before
next Tuesday's CCRB.
7. EGI Workshop
================
JG reported that this had been a disappointing workshop with no real
commitment to a large infrastructure. The larger countries were generally
supportive but overall planning was poor. At the WLCG OB countries were
asked what they would do if there were to be no EGEE after 2010 - there were
funding issues and a general scaling-back. There would be a workshop at CERN
at the end of June, beginning of July, at which a blueprint might be proposed -
input from GridPP was required. JG noted that the EGI Workshop was a 'closed'
workshop but a Plan B for WLCG would be required in order to keep the Grid
operational.
1. IHEPCCC
===========
PC had circulated an email regarding his attendance at IHEPCCC, where he had
been 'placeholding' to see what was generally happening. One document had
been produced and an Annual Report, similar to CNAP. The purpose of the
meeting was unclear however. DB suggested that they needed to clarify their
Terms of Reference and define inputs and outputs - then we would be in a
position to give an opinion. The group could most usefully focus on non-Grid,
non-technical issues, but if they could not define their ToR then it would not be a
helpful body. It was agreed that PC should continue to attend meantime.
2. GRIDPP21
============
DB asked for input from the PMB on possible date and location. DB had
contacted Chris Allton and Swansea was a possibility but not yet confirmed. JG
noted that in September there were clashes with the AHM, OGF and EGEE. DB
noted that not many would attend OGF, August was not a good month and
October at Institutes was difficult. JG advised that it was mainly security and
storage people that attended OGF. DB proposed the week commencing 15th
September for a three-day meeting, with another meeting around March 2009.
It was agreed that DB would discuss this with Chris this week and bring it up
again at the next PMB.
3. ORACLE Licences
===================
AS had circulated an email regarding this issue. 14 more licences were required
due to increased specifications, but this led to funding issues. They would be
paid for over 5 years @ £10k pa. This could be paid from the maintenance
costs at the Tier-1. DB proposed that this be handled at the Tier-1 but agreed
that it was an unexpected cost. The PMB agreed the payments.
4. Tier-2 Grants
=================
It was reported that Peter Hobson had been in touch regarding the spending of
funds in advance. This related to CMS Tier-2 hardware. It was noted that DC
had raised this after Easter, and DB had contacted Janet Seed. DB reported that
the Tier-2 hardware grants will be issued but the spend was limited to 75% of
the total, and it would be reviewed in July - this represents half of the Tier-2
budget line. DB had emailed Trish Mullins but there was no response as yet. DB
advised that 75% leaves headroom to cut the grants if absolutely necessary.
Current plan was that these grants would be issued in full. This was agreed.
6. New GridPP Website
======================
SP advised that she had received some comments relating to the new GridPP
website. SP noted that the meeting page was not easily found, it was in a
different location now, and the sidebar had been removed. SP advised that they
had tried, within the re-design, to highlight the most important sections. DB
suggested that one needed to view the front page through the eyes of someone
who had never seen it before, who might be looking for help. For users, a first
point of contact was required, elevated to a higher level on the front page -
something like 'how do I get started' or similar. DB noted that there was also
no obvious link to GGUS. DB suggested that the front page 'information' should
say 'help' at top level, and should show experiment help pages for experiment-
specific problems. Phrases like: 'help', 'get started', 'experiment help' or 'grid
help' should be used, but not going directly to GGUS. SP would amend
accordingly and re-circulate. SP invited further comments before the end of the
week.
8. AOCB
========
- DB noted that the Quarterly Reports were due, particularly from AS - these
were now urgent and were required as soon as possible.
- SP asked about the LCG box on the Project Map - she had iterated with TC
regarding milestones and metrics. TC had suggested a variety of reports in April
and June each year, but it was noted that January would be preferable following
LHC operations - reports would probably have been produced by that time
anyway. AS asked whether the Tier-1 information was not being duplicated?
DB suggested that the milestone should be, rather: 'are we ready for data-
taking' rather than the milestone being the writing of the report itself - it should
be re-worded to show we are substantially ready - it should show red if there is
a problem. It was noted that this was an LCG box, not a Tier-1 box. SP noted
that it related to what GridPP will be delivering to LCG, the Project Map overall is
to show to what extent GridPP was ready in its various categories. DB noted
that in relation to UK performance there should be an element of success to
report, or otherwise. SP asked who decided whether it was green or red? DB
advised that there were two scenarios: if the UK failed but other countries
succeeded then there was an easily identifiable problem; on the other hand, if all
countries failed due to a common problem or ill defined metric, it was harder to
determine. It was agreed that this should be discussed further. SP asked if
metric 6.3.9 was reasonable (number of invited UK talks at wLCG meetings)?
RM noted that travel claims suggested this was nearer 4-6 at best - this
translated to one per quarter and was reasonably straightforward to meet. RJ
further suggested changing the wording to 'e-Science' rather than GridPP.
- RM asked about WLCG workshop travel. Previously he had said 1 per institute
should go, and sign-off on the exceptions, however Chris Brew would like 3 to
attend from RAL PPD and the Tier-1 had a number who wanted to attend. JC
noted that he endorsed the requests, then it was envisaged that 3 more at
PMB-level would be added. This totalled around 30 (rather than 20) but not all
were going for the whole week. The requests matched the rooms available. DB
suggested that this was a critical time now, in the run-up to data-taking, and
that the costs might be higher but it was important to support technical
interactions. The PMB approved the travel requests. RM confirmed he would
advise JC.
STANDING ITEMS
==============
SI-1 ATLAS weekly review and plans
-----------------------------------
RJ reported that this week ATLAS was doing throughput and functional tests,
and that there were problems due to disk space. They were also doing the usual
production pre-FDR2. Outstanding issues: they were not running at 100% over
the past few weeks at RAL; RJ asked whether the 24/7 cover had started? AS
would report in his weekly overview.
SI-2 CMS weekly review and plans
---------------------------------
DC was not available this week.
SI-3 LHCb weekly review and plans
----------------------------------
GP reported a few issues: the Tier-1 sites ran out of disk space (not RAL);
scheduled CCRC'08 production was ongoing, although there were issues with
glexec and pilot jobs.
SI-4 Tier-1 Manager's report
-----------------------------
AS provided the following report:
1) Purchases
a) Disk tender - 126/182 disk servers have now passed acceptance.
b) CPU tender - Our 28 day load test has been running 1 week on some of the
servers. It has yet to start on the other supplier servers owing to (minor)
problems getting the provisioning system to install correctly.
c) Oracle server hardware upgrade order has been received.
d) Some Xen capable hosts have been received for the PPS cluster. We
probably achieved all our planned spend for FY07 - although not all the financial
data is available to confirm that this is so.
2) Backplane work has nearly completed - there are twelve servers outstanding
on the ATLAS CASTOR instance. These will be replaced on Thursday.
3) We will be making some OPN routing changes in w/b 21st to allow more
servers to be added to the OPN network.
Service
-------
1) SAM availability for last week was 88% (SL extract) - the downtime was due
to the failure of two redundant PSUs on a critical file server.
2) We have been running trial on-call system (just first line callout) for 7 days.
We had one nightime callout but have yet to switch CASTOR alarms to the pager
system. We plan to have an initial service in place in time for CCRC08. DB asked
whether there would be a phone number available on the MB pages for 24/7
callout? AS advised that rather than a telephone call, there will be a bleeper
number. AS will liaise with JG regarding the MB contact number which is
currently on the page (at the moment it is Martin Bly), in order to get this
changed. DB will send the url to AS.
3) CASTOR
a) Upgrades to 2.1.6 are expected to be completed this week after some
problems were resolved and some went away.
b) With help from CERN we are making progress to understand and improve
tape writing performance. We ran one (unsuccessful) trial but will have another
attempt with CMS this week.
4) We have a problem with the published APEL data for March and are working
with the APEL team to get our accounts in a consistent state.
5) We will need to purchase additional Oracle licences via CERN (seperate email).
6) JANET intend to use Tier-1 OPN traffic as a data source for the MASTS project
http://www.masts.uklight.ac.uk/ We are assured that data captured will be
treated in accordance with RIP and the Data-protection act but the Site security
officer and I will be meeting with JANET and the site security officer to
understand exactly what is planned later this month. An intervention will be
required in order to install an optical splitter on the fibre.
7) Dante have installed a Perfsonar monitoring server to monitor OPN traffic
statistics. This is just an early test version to stay with us for a few weeks.
8) SL4 Migration: The SL4 UI continues to be held up owing to team priorities
being focused on hardware procurement, installation and acceptance.
SI-5 Production Manager's report
--------------------------------
JC was not available this week.
SI-6 LCG Management Board Report
---------------------------------
DB reported from the meeting last Tuesday. There had been discussions re OSG
sites and SAM metrics and the CCRC Phase 2. There was also a report from the
Overview board noting that the machine would be cold by mid-June but that
there was a problem with some magnets. A decision was needed on whether to
re-train or to run at lower energies for now. The power limitations on CERN
computing facilites was also discussed. The current LCG project would be
extended for 1-year to the end of 2009 to cover the first year of data taking.
There was a long discussion on milestones and metrics including new reliability
targets for the Tier-1s and Tier-2s. There would be no MB this week.
SI-7 Dissemination Report
--------------------------
SP reported that TD and DB had been interviewed by the Sunday Times and this
had led to the story being picked up by other newspapers, the Mirror, the
Telegraph, Sunday Times in Australia, and Indian newspapers. DB noted that he
had received several email messages, and TD reported he had also had
interviews and messages. TD would email DB with the information given in order
to ensure consistency. SP reported that the actual press release was with STFC
for comment. Neasan O'Neill was at the STFC Advisory Group meeting today.
REVIEW OF ACTIONS
=================
280.7 JC to mention the issues (when approached by a VO with regard to
joining) of the 'standard' 6-month introduction period, following which the VO
must set-up something specific to them, if appropriate. This was discussed at
DTeam. JC to email GridPP VO members if possible - ongoing. This was a
standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to
VO members. JC to send email.
290.4 AS and JG to iterate regarding what could replace the Tier-1 Board.
Ongoing.
290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the
ones provided for Tier-2. AS to provide the final GridPP2 and 2+ Quarterly
Reports by end March. Ongoing.
290.8 AS/SP to iterate regarding the financial summary in the Quarterly
Reporting (eg: Outturn figures).
290.23 AS/JC to iterate on the Disaster Recovery template and remove
capturable items that were considered to be minor. Ongoing.
290.24 JC to progress his suggested template to use when a crisis occurs - to
be revisited subsequently at a PMB.
292.1 TC and JC to iterate regarding the CERN system that recorded service
interdependence and enabled them to recover from crisis events. Reply awaited,
to be followed up. TC to check with JC.
295.3 It was agreed that there should be a formal look at Network Planning for
the Project Map next year involving PC, RJ, DK and RM - PC to organise. Note
change of initials below.
295.4 TD (as Technical Director) to address the issue of Data & Storage on the
Project Map and get back to SP with inputs. Ongoing.
295.5 RM to get back to SP with inputs regarding the EGEE box on the Project
Map. Ongoing.
295.6 SP noted that she was awaiting a VOMS report from AS and a Grid
Vulnerability report from DK - these were almost in the nature of two Quarterly
Reports. AS and DK to provide appropriate inputs. These related to metrics and
milestones from the Project Map.
295.7 Re network contingency, PC to request clarification from Robin Tasker if
the cost quoted was for 1Gig only.
295.8 Re NGI planning, JG to produce a document/statement on the GridPP
position (due to his MB perspective), and SP to assist with metrics. JG to liaise
with RM re EGEE inputs.
295.9 DB, RM and SP to target categories for the travel budget for the coming
year. Targets are required for how much GridPP might spend and in what
categories of expenditure.
295.10 RM to provide categories and breakdown of travel + additionals to
enable monitoring and decision-making. Ongoing.
296.1 SP to approach STFC for feedback on a proposed press release relating
to GridPP3. Done, item closed.
297.1 DB to speak to SP regarding circulation and co-ordination of paper for
AHM. SP to collate final documents.
297.2 All: inputs to be sent to GP for inclusion in GridPP paper at AHM
(September, Edinburgh).
297.3 TD to get dTeam input relating to NGS minimal software stack; issue to
be addressed at next PMB.
ACTIONS AS AT 07.04.08
======================
280.7 JC to mention the issues (when approached by a VO with regard to
joining) of the 'standard' 6-month introduction period, following which the VO
must set-up something specific to them, if appropriate. This was discussed at
DTeam. JC to email GridPP VO members if possible - ongoing. This was a
standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to
VO members. JC to send email.
290.4 AS and JG to iterate regarding what could replace the Tier-1 Board.
290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the
ones provided for Tier-2. AS to provide the final GridPP2 and 2+ Quarterly
Reports by end March.
290.8 AS/SP to iterate regarding the financial summary in the Quarterly
Reporting (eg: Outturn figures).
290.23 AS/JC to iterate on the Disaster Recovery template and remove
capturable items that were considered to be minor.
290.24 JC to progress his suggested template to use when a crisis occurs - to
be revisited subsequently at a PMB.
292.1 TC and JC to iterate regarding the CERN system that recorded service
interdependence and enabled them to recover from crisis events. Reply awaited,
to be followed up. TC reported that an email was sent to JC by the person that
developed our Service Database system on 20th Feb. TC to check with JC.
295.3 It was agreed that there should be a formal look at Network Planning for
the Project Map next year involving PC, RJ, DC and Robin Tasker - PC to
organise.
295.4 TD (as Technical Director) to address the issue of Data & Storage on the
Project Map and get back to SP with inputs.
295.5 RM to get back to SP with inputs regarding the EGEE box on the Project
Map.
295.6 SP noted that she was awaiting a VOMS report from Andrew McNab and a
Grid Vulnerability report from DK - these were almost in the nature of two
Quarterly Reports. AS and DK to provide appropriate inputs. These related to
metrics and milestones from the Project Map.
295.7 Re network contingency, PC to request clarification from Robin Tasker if
the cost quoted was for 1Gig only.
295.8 Re NGI planning, JG to produce a document/statement on the GridPP
position (due to his MB perspective), and SP to assist with metrics. JG to liaise
with RM re EGEE inputs.
295.9 DB, RM and SP to target categories for the travel budget for the coming
year. Targets are required for how much GridPP might spend and in what
categories of expenditure. Initial proposal from RM required.
295.10 RM to provide categories and breakdown of travel + additionals to
enable monitoring and decision-making.
297.1 DB to speak to SP regarding circulation and co-ordination of paper for
AHM. SP to collate final documents.
297.2 All: inputs to be sent to GP for inclusion in GridPP paper at AHM
(September, Edinburgh).
297.3 TD to get dTeam input relating to NGS minimal software stack; issue to
be addressed at next PMB.
298.1 DB to contact Janet Seed about the reduction in the UK Tier-2 pledges
for NorthGrid and London.
298.2 DB and JG to iterate regarding the understanding of the Tier-2 pledge
numbers before next Tuesday's CCRB.
298.3 DB to discuss possible dates for GridPP21 with Chris at Swansea, and
raise the issue again at the PMB.
298.4 SP to put help links upfront on the website, relating to both new users,
and to experiment-specific assistance, and re-circulate.
298.5 AS to liaise with JG regarding the MB contact number for 24/7 callout
which is currently on the page (at the moment it is Martin Bly), in order to get
this changed.
INACTIVE CATEGORY
=================
271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN
link, in one year's time, when actual data on breakages is available. Due date
would be September '08.
271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the
issue and collate info so that the PMB have something to revisit in one year's
time. Due date September '08. It was noted that PC would circulate a revised
document after discussion with ATLAS (RJ/PC/DN to iterate).
282.8 RM to monitor how R-GMA and networking issues impact on GridPP as
matters progress. RM advised that this item should be moved to the 'inactive'
category as it will develop over the coming months. RM discussed the issue with
Steve Fisher and advised that support of R-GMA is required whilst APEL is
dependent on it. RM reported that he has spoken to SF and there is currently no
change to the R-GMA situation - process ongoing.
290.19 DB/SP to progress the details of the Project Map over the next few
months, cross-checking that all elements are incorporated, including strategic
priorities and staffing. To be completed before the next Oversight Committee.
The meeting closed at 3:05 pm. The next PMB would take place on Monday 14
April at 1:00 pm.
|