Dear All,
Please find attached the GridPP Project Management Board
Meeting minutes for the 382nd meetings. The latest minutes can
be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Dave.
--
________________________________________________________________________
Prof. David Britton GridPP Project Leader
Rm 480, Kelvin Building Telephone: +44 141 330 5454
Dept of Physics and Astronomy Telefax: +44-141-330 5881
University of Glasgow EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________
GridPP PMB Minutes 382 (22.03.10)
=================================
Present: David Britton (Chair), Steve Lloyd, Andrew Sansum, Tony Doyle, Robin Middleton, Pete
Clarke, Roger Jones, Tony Cass, Jeremy Coles, Glenn Patrick, David Kelsey (Suzanne Scott, Minutes)
Apologies: Sarah Pearce, John Gordon, Dave Colling, Neil Geddes
1. F2F Agenda for RHUL
=======================
SP wasn't present, but she had sent the Agenda for RHUL:
- discussion re the CASTOR upgrade (AS)
- state of readiness for the PPRP - this would be important for those going to the meeting, it would
give the PMB a chance to comment on DB's talk etc.
- update on EGI/NGI status
- possible review of experiments' data-taking
Were there any other items to include? If so, these should be emailed to SP. DB was still working
on the main Agenda. AS requested CASTOR discussion at end of day so that a couple of invited
people from RAL could attend.
2. GridPP4 Reviewers' Feedback
===============================
DB had drafted a response and circulated this. The reviewers had been helpful and there wasn't a
great deal to pick up on.
Reviewer 1: were there any comments? TD advised that it might be useful to add that NGI will be
formed from a combination of NGS and GridPP as per the proposal. AS noted that Security was
already integrated. DB agreed, noting that the key point was there were good reasons why
convergence had been slow to date. DK advised that there were already many areas in operation
where we were working with NGS. PC commented on style: he advised caution about the first part
of each paragraph, noting the comments could be interpreted as disagreeing - he suggested a
wording change, thanking the reviewer for their useful comment in bringing up x. DB noted he
didn't want to interpret what had been said, which is why he had quoted exactly. Re the 'no viable
alternative' comment by Reviewer 1, TD noted that 'therefore' there had been no attempt to
review other options. Re the third comment on cuts, TD noted that the statement from PPAN
could be included (and which would be raised at the PPRP) which stated the opposite view. DB
noted he would try and balance these.
Re the comments from Referee 3, it was agreed to leave until we received any direct questions
from the panel. DB noted that it was difficult to explain the cut made. TD advised that re-
emphasis of the cut made, at this point, would be good. PC thought we should ignore it, as they
would ask anyway. It was agreed not to comment.
PC noted that overall we should thank all of the Reviewers for their supportive comments. DB
would amend the letter accordingly.
3. Tier-2 Investment Status
============================
SL reported that he had been trying to make sense of the replies received. He was looking at the
capital expenditure - the seven numbers he had divided by installed CPU. Bristol and Edinburgh
were large figures - if he ignored them, then the others didn't agree, but they were all in the same
order of magnitude, which resulted in a number of about £10 million. SL reported that he had put
a call out for more information. SL noted he could work out the cost of hardware the Universities
have provided and subtract-off what we've paid? He could estimate the running cost and
unfunded effort? DB advised that this would be useful - he could use the info to prepare a slide for
the PPRP if asked about this. If any other info was available it should be sent to SL - particularly
from Imperial.
4. EGI/NGI Paper status
========================
DB noted that last week we had looked at effort numbers. RM advised that these had not changed.
JG had expressed concern about the preliminary nature of the numbers and advised not
circulating this. RM reported that he had worked on the tables and text. He had been in touch
with NG and had summarised the governance of NGI. DB noted that this was just our current
view, not a commitment. RM noted that the risk area needed tidying, but he would circulate the
updated paper tomorrow. DB would use it to prepare slides for the PPRP. RM could review the
summary prior to the F2F at RHUL.
ACTION
382.1 RM to circulate updated paper on EGI/NGI.
5. Week's Notes
================
a) Tier-2 Service re-start priority: AS advised that the context of this proposal had been circulated
at the experiments' liaison meeting, but nothing had been agreed. He was actioned to push this to
the PMB to discuss priority re-start order. AS favoured sorting out the UK ATLAS component first.
AS would ask for comments at the experiment meetings.
b) It was noted there would be no PMB meeting next week. The following week was Easter. It
was agreed to have a brief meeting at NOON on Tuesday 6th April (DB would be working at home
that day). The following meeting would be the 13th, F2F at RHUL.
c) STFC IPS Panel Nomination: Were there any nominations? Did GridPP need someone on this
panel? If so, was DC the right person? PC noted it ensured industrial relevance if we had someone
attending. DB would speak to DC about this, and agreed it would be useful to have someone
involved.
d) ATLAS/CMS speakers at RHUL? RJ would speak for ATLAS, DC for CMS.
e) Publications Update: It was noted that a message had been circulated by Neasan O'Neill - but
he had only received seven replies so far. This issue may come up at the PPRP and it was
important to show that we were research active.
ACTION
382.2 ALL: PMB members to circulate the message round their own groups - a complete list of
publications was required on the GridPP website.
6. AOCB
========
Travel: AS tabled a proposal from Ian Collier to host a Quattor workshop at RAL in the Autumn
(Sep/Oct) - it was our turn to do this, and might be in the order of 30 participants. This might cost
£500-1000 for refreshments and lunch. DB considered this sounded reasonable - if the cost were
likely to go to £1500 it should be brought back to the PMB for final approval, but in principle it
was OK to go ahead. The PMB approved the cost of around £500 with an upper limit of £1000.
RM advised that the CHEP abstracts deadline was the end of April.
STANDING ITEMS
==============
SI-1 Tier-1 Manager's Report
-----------------------------
AS reported as follows:
Fabric:
1) FY09 procurements:
- Acceptance tests on one supplier lot of disk servers has been running since last Thursday with no
problems. Performance also appears to be good. We expect these to complete acceptance by 15th
April.
- Acceptance tests on one lot of CPU servers have been running since 16th March and have 3
weeks to go.
- Second lot of disk and CPU servers should be ready to start acceptance tests this week.
2) FY10 procurements
- We have started the process of updating the procurement documentation for FY10
procurements.
3) All other minor purchases have been completed and delivered.
Service:
A quiet week operationally.
1) The weekly operations summary is at:
http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2010-03-17
2) SAM test availability for the ops VO was 100%.
3) We are working on an upgrade strategy for CASTOR from 2.1.7 to 2.1.8 or 2.1.9.
4) glexec will be deployed today.
5) FTS was sucessfully upgraded last week. There was an 8Gb network spike to SJ5 on restart as it
caught up.
See: http://www.gridpp.rl.ac.uk/blog/2010/03/17/fts-upgrade/
SI-2 ATLAS weekly review & plans
---------------------------------
RJ reported all quiet at the moment - awaiting data.
SI-3 CMS weekly review & plans
-------------------------------
DC was absent.
SI-4 LHCb weekly review & plans
--------------------------------
GP reported all quiet at present - awaiting data.
SI-5 Production Manager's Report
---------------------------------
JC reported as follows:
Overall site availability was good last week and site stability is fine.
1) A few more issues have been identified with the nagios based regional monitoring. Core
among these is that the email alarm function used by SAM is no longer present – sites found this
subscription service useful. The best solution for sites is currently to install their own nagios
monitoring. There are also still differences between what SAM and Nagios reports – the
availability reports still use SAM. Differences are being tracked: http://pps-
sam.cern.ch/gridview/regions/UKI.html
2) Some shortcomings with the SL5 meta-rpm (now also known as the OS_HEP rpm) were
identified for ATLAS. Sites/experiments are requested to ticket the rpm maintainers to ensure all
relevant/needed libraries are included.
3) LHCb upload problems for Glasgow, Sheffield and Brunel are still being investigated. It looked
like a NAT issue but has been difficult to recreate and test (LHCb jobs stopped flowing).
4) ARGUS now appears in the gLite 3.2 production release! Given that ARGUS will replace SCAS
the request to deploy SCAS/glexec now becomes ARGUS/glexec (though SCAS is still a valid
choice).
5) A new RAL CE will point non-LHC VOs towards SL5 resources (currently SL4 is used). The VOs
have been asked to raise any migration issues as soon as possible.
SI-6 LCG Management Board Report
---------------------------------
Nothing to report.
SI-7 Dissemination
-------------------
SP was absent.
REVIEW OF ACTIONS
=================
354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for
escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There
were a few suggestions in the deployment team about how to progress in this area. It needs
writing up and an implementation plan. JC to progress. Ongoing.
366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015.
DB advised this related to long-term plans and power capacity. Physical footprint space?
Alternatives? AS had sent tech questions round the team and would forward inputs when
available. AS noted that alternative further costings were required. AS to progress. Ongoing.
379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information
to the WBS. Ongoing.
380.1 SL to circulate an Agenda for the Deployment Board meeting at RHUL. Ongoing.
380.5 RM/SP to make changes to the EGI/NGI paper as discussed and bring back a revised
version to next week's PMB. Ongoing.
380.7 Re the OPN backup link: AS to find out: 1. When the link is supposed to be operational; 2.
More detail about how and when the link will be tested. If possible AS should delay Invoice
payment until more information was forthcoming.
AS reported that the date for deployment was 1st May. JANET would provision to GEANT on 22nd
March, and work at CERN would proceed to the same timeframe. They had agreed to pay the bill.
DB noted that the PMB should be kept informed. AS had circulated an email summary. DB asked
if this would be tested as part of the standard OPN test? He noted that due diligence was required
and AS should do a test failover after it is deployed. Done, action closed.
380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info
will be needed after the PPRP. Ongoing.
ACTIONS AS OF 22.03.10
======================
354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for
escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There
were a few suggestions in the deployment team about how to progress in this area. It needs
writing up and an implementation plan. JC to progress.
366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015.
DB advised this related to long-term plans and power capacity. Physical footprint space?
Alternatives? AS had sent tech questions round the team and would forward inputs when
available. AS noted that alternative further costings were required. AS to progress.
379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information
to the WBS.
380.1 SL to circulate an Agenda for the Deployment Board meeting at RHUL.
380.5 RM/SP to make changes to the EGI/NGI paper as discussed and bring back a revised
version to next week's PMB.
380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info
will be needed after the PPRP.
382.1 RM to circulate updated paper on EGI/NGI.
382.2 ALL: PMB members to circulate Neasan O'Neill's message re publications round their own
groups - a complete list of publications was required on the GridPP website.
INACTIVE CATEGORY
=================
359.4 JC to follow up dTeam actions from the DB, as follows:
---------------------------
05.02 dTeam to try and sort out CPU shares and priority resources, at
Glasgow first (perhaps by raising the job priority in Panda).
---------------------------
JC would check the situation with Graeme Stewart (who was currently on annual leave).
JC followed up with Graeme and the other experiments. A test was
started but this area has been deemed low priority and further
progress is not expected for some time. ATLAS see no issues with
contention. LHCb are not intending to pursue anything in this area. A
CMS discussion has started but again it does not appear to be urgent.
If the experiments are not pushing this internally then there is
nothing for the deployment team to follow up!
It was noted there was no priority in ATLAS at present, this will be pending for a while. Move to
inactive as it is a long-term action.
---------------------
There would be NO PMB next Monday. The following week was Easter. It was agreed to have a
brief meeting at NOON on Tuesday 6th April. The following meeting would be the 13th, F2F at
RHUL.
|