Dear All,
Please find attached the weekly GridPP Project Management
Board Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
John
GridPP PMB Minutes 246 - 12th February 2007
===========================================
Present: John Gordon, Sarah Pearce, Roger Jones, Stephen Burke, David
Britton, David Kelsey, Dave Newbold, Steve Lloyd, Tony Cass, Jeremy Coles,
Andrew Sansum
Apologies: Peter Clarke, Tony Doyle, Robin Middleton, Glenn Patrick
1. Site Readiness Review
=========================
Members reported on the preparation for the reviews:
ScotGrid - No dates have been fixed yet. The original plan to append to
GridPP18 has been abandoned due to key people being unavailable due to
DTeam/User Board meetings that week.
NorthGrid - still hoping to find dates in April.
SouthGrid - Team has agreed on March 9, 13, and 14 but havent yet got
agreement of all sites.
London - Agreed dates in late April. Still checking with sites on the
order.
DB said he would have a draft of the review announcement and questionnaire
with a few days.
ACTION DB Produce draft of review documents.
2. Oversight Committee
=======================
The meeting planned for 8th February was cancelled on the day after
several members cancelled due to weather and related childcare problems.
No date has yet been fixed for the replacement meeting.
STANDING ITEMS
==============
SI.1 Dissemination Officer's Report
-----------------------------------
There had been one news item in the last week; on the last cpu upgrade. Several more were in preparation:
- Hannah Cumming on the Total VO
- RJ on ATLAS
- SB on gLite User Guide
- JET
- WLCG Workshop - JC coordinating input from several people.
- DN on CMS
A report on the EUGrid PMA meeting at RAL recently will be in ISGW next week.
Neasan O'Neill will be visiting the Science Museum this week to discuss
their LHC exhibit.
SI.2 Tier-1 Manager's Report
----------------------------
AS provided the following report:
Hardware:
1) Supplier One delivery - Integration into dCache is complete. The CASTOR
team have all 15 servers planned for deployment but have a problem
integrating them into CASTOR. The difficulty is that "garbage
collection" cannot be made to work (yet) despite the fact that it works
fine on identical hardware already deployed into CSA06.
2) Supplier Two Delivery (I) - Acceptance testing completed. Servers will
need to be deployed into capacity to meet the March UB allocations.
3) Supplier Two Delivery (II) - Acceptance tests have started and should
finish by the 9th March.
4) Tape Purchase - 350TB of media has been ordered - delivery 1 weeks
time.
5) Tape drive purchase - 3 drives have been ordered - delivery in about 2
weeks time.
6) Tape drive servers - Ordered delivery in about 4 weeks time (estimate).
Service:
The top level BDII will be replicated onto two new servers by Wednesday.
The FTS was unavailable for much of the weekend following multiple core
dumps filling up a partition. It is scheduled to be down on Tuesday.
In the absence of Derek Ross (leave) we were unable to meet last week to
discuss the SL4 rollout. However I have informed CMS that we will
definitely not be able to provide SL4 within 1 month.
Job CPU efficiency for January fell to 64%. This appears to be dominated
by LHCb who suffered 38% efficiency for a large share of total resources.
LHCb believe that this is caused by performance and reliability problems
in RAL's dCache - we are investigating. Testing of the dCache 1.7 upgrade
has been completed successfully and this is planned to be deployed ASAP -
it may help resolve this issue (although not specifically addressed in the
revision history).
JC asked if the delay in deploying disk to Castor was affecting CMS. Yes
but ATLAS is affected more. JC asked when the second RB would be deployed.
This happened before Christmas.
SI.3 Production Manager's Report
--------------------------------
JC provided the following report:
1) The WLCG GDB was held last Wednesday
(http://indico.cern.ch/conferenceDisplay.py?confId=8469). John Gordon
was elected as the next GDB chairperson. Markus Schulz gave an update
on gLite on SL4 (we do not expect something in production until at
least April, so sites are going to need to use published workarounds
ahead of the experiment Full Dress Rehearsals (FDRs)).
AS raised worries that the inevitable partial deployment of SL would
fragment the cpu cluster and reduce efficiency still further.
2) Utilisation of CPU has been low across the UK (<40%) for about the last
week as LHCb have stopped running jobs while a bug in their production
code is fixed. Meanwhile enablement of camont.gridpp.ac.uk and
total.vo.gridpp.ac.uk on the GridPP RBs and CEs is taking longer than
expected.
3) We continue to see a lot of SE problems as they become more widely
tested and used (Firewall, gridftp doors in dCache, SE information
publishing and full disks and implementation of ATLAS ACLs are some
recent issues). The continued instability in the top-level BDII is not
helping the situation.
4) A Tier-2 board (VRVS or phone) meeting is scheduled for this Friday
10:00-13:00. Discussion topics include: meeting MoU disk requirements,
cover at sites, experiment-site interaction and OS policies.
5) The Deployment Team are trialing an Operations Blog in the hope that it
will provide a consistent view on problems being resolved each day
(http://gridpp-ops.blogspot.com/). We already have Tier-2 Blogs and
many other information sources so there are concerns about whether this
new blog is worthwhile.
SI.4 LCG Management Board Report
--------------------------------
JG reported that SRM2.2 was not expected to be tested at Tier1s until
March. This would impact experiments plans to use it. Most of the rest of
last weeks MB was discussion of the Harry Tables which document Tier1
deployment plans in finer detail than the MoU and match them against
experiment requirements. They duplicate information also gathered through
Quarterly Progress Reports. A document clarifying who reports where and
which information is regarded as definitive would be prepared to focus
this discussion.
SI.5 Documentation Officer's Report
-----------------------------------
SB reported that the most significant recent issue was the name change of
the CERN wiki which had broken documentation links across the world.
(note: CERN have since added a redirection link from the old wiki
address).
REVIEW OF ACTIONS
=================
236.6 GP to summarise and circulate the LHCb model as a basis for
discussion. GP would now focus on this as two models were now available.
Ongoing.
245.1 DB to report on the user statistics. DB had prepared a transparency
for the OC. Item closed.
245.2 JC to forward information on WLCG meeting to SP. Ongoing
245.3 JC to contact the Tier-2 co-ordinators regarding reports for the
Manchester EGEE User Forum. Done. Item closed.
245.4 SP to send out an All Hands and EGEE User Forum roundup to
UKHEPGRID. EGEE UF done. Ongoing.
ACTIONS AS AT 12.02.07
======================
236.6 GP to summarise and circulate the LHCb model as a basis for discussion.
245.2 JC to forward information on WLCG meeting to SP.
245.4 SP to send out an All Hands reminder to UKHEPGRID.
246.1 DB to produce draft of review documents.
The meeting closed at 13.55.
|