Dear All,
Please find attached the weekly GridPP Project Management
Board Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE GridPP Project Leader
Rm 478, Kelvin Building Telephone: +44-141-330 5899
Dept of Physics and Astronomy Telefax: +44-141-330 5881
University of Glasgow EMail: [log in to unmask]
G12 8QQ, UK Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________
GridPP PMB Minutes 256 - 30th April 2007
========================================
Present: Tony Doyle, Sarah Pearce, Roger Jones, Stephen Burke, David Britton,
Dave Newbold, Steve Lloyd, Robin Middleton, John Gordon, Jeremy Coles,
Peter Clarke, Glenn Patrick, Andrew Sansum, Neil Geddes, Suzanne Scott (Minutes)
Apologies: David Kelsey, Tony Cass
Yingqin Zheng was continuing to observe for the Pegasus project.
1. RRB Meeting
===============
RJ reported that this had been a short meeting. There had been a
presentation by Les Robertson regarding LCG storage issues; also a
presentation on resources by Chris Eck. RJ noted his concern that the
information given by Chris Eck was incorrect, and this gave a false
impression of current status. RJ intended to get the correct figures, and
was concerned that incorrect ATLAS information had gone into the public
forum without being checked - and this could affect funding issues. TD
reported that he had sent an email to Janet Seed in relation to the
UK-side information, and this had been received, information on the
Experiments was embedded within that report. It was noted that Janet Seed
of STFC was present at the meeting. It was understood that UK input was
pre-GridPP3, and had been submitted last October. DN asked about Tier-1
funding at CERN. RJ reported that as presented, it appeared to be fully
funded as per Chris Eck's slides.
2. Security Policy
===================
DK had circulated an email regarding this, which had gone to the Tier-2
Board. It was noted that this needs to be reviewed from the UK
perspective. NG agreed to look at this from NGS perspective, and SL from
Tier-2, and report-back.
3. AOCB
========
CHEP/WLCG Travel
----------------
The GridPP policy statement was agreed as: the people providing input
(discussed at the meeting) would receive 50% funding from the GridPP
travel budget. RM will contact the individuals concerned and confirm
this.
Horizon programme
-----------------
It was noted that there was a Horizon programme focussing on LHC on
Tuesday evening. GridPP was not involved in this. SP would refer to CERN
for a response to any negative feedback re black hole creation.
STANDING ITEMS
==============
SI-1 Dissemination Officer's Report
------------------------------------
SP reported that at Queen Mary this Thursday, Alan Sugar would be coming
to open the cluster. The supplier had publicised this extensively, and
the RealTime Monitor would be in the reception room and elsewhere. SP
reported that preparations had been made for OGF/EGEE and miminal freebies
would be provided, pens initially with possibly some high-tech items as
well. SP noted that the launch of LHC@home had a press release slot for
Tuesday, and test running of jobs would be carried out in advance of this.
There would be a news item being run regarding the schools work which
Neasan O'Neill is currently doing in relation to master classes and talks.
There would also be news items forthcoming on the e-Science cluster and
OGF/EGEE.
SI-2 Tier-1 Manager's Report
-----------------------------
AS reported that the tape drive servers had been received and were being
configured. This was not a high priority task and was expected to take
another couple of weeks to complete. At present borrowed equipment was
being used. The RAL networking group were in the process of obtaining a
public AS number in order that the Tier-1 can route Tier-1 -> Tier-1
traffic by the OPN. A request was outstanding and a response was expected
shortly.
Service
-------
AS reported that SAM availability was 100% for W/B 23rd April.
It was noted that airconditioning had failed in A5 upper (where the tape
robot/CASTOR/ADS service live) on 25 April 11:58. This had been caused by
a (major) electrical failure in switching gear when air circulation was
routinely switched(manually) between two fans. Cooling was off for about 2
hours while building maintainance moved temporary chillers into place.
CASTOR was shutdown to reduce heat load but the ADS service (tape backend
to dCache) continued. CASTOR service was restarted and 4 hours of
scheduled downtime was recorded in the weekly CIC report.
It was noted that the scheduled (2 day) upgrade to CASTOR had started
today. It was noted that a version number was required, and AS would
provide this [upgrade version 2.1.2-9, provided after the meeting]. AS
reported that the first "extended" CASTOR technical meeting had been held
on 25th April. It appeared to be a success, although the level of detailed
discussion varied between experiments. An issues and action list was being
developed on the GRIDPP wiki and would be published next week once the
next technical meeting agreed the format. The next CASTOR stretegy
meeting would be held this afternoon.
Regarding jobs lost at the CE, AS reported that the rate was now very low
but monitoring was continuing and measures were being put in place to
detect a recurrence.
Regarding inaccessible LHCB tape data, it was discovered that three
(physical) LHCB tapes were causing problems when read because of
unexpected tape marks. This appeared to be caused by the same incident
that had caused the losing of ATLAS data several months ago. Work was
ongoing to generate a tape-> file mapping to provide to LHCB and ways were
being investigated of bypassing the double tape mark to access the data
beyond EOF. Work was ongoing on a means of querying the FLFSYS catalogue
to identify any more tapes in the same category.
Regarding disk data corruption issues (raised by CERN + others) AS
reported that they were in discussions with other sites and working on
getting the CERN tools working. Dipstick testing was expected to be
completed this week and more general tests will commence next week
depending on what the initial dipstick tests show. It was noted that RAL
is now registered as a member of the PPS service.
GP reported that LHCb had encountered problems with tape where data access
slows down at the weekend - was AS aware of any specific problems or
issues that might be causing this? AS reported that there were no issues
that he knew about that could impact data access speeds. Backups were
done at weekends, but this should have no impact.
SI-3 Production Manager's Report
---------------------------------
JC reported that PPS-RAL had now joined the pre-production service as part
of the UK contribution. Birmingham and Imperial College already provided
effort in this area. Use of the PPS was still evolving and a meeting on
the 3rd May was expected to clarify the situation
http://indico.cern.ch/conferenceDisplay.py?confId=15191. One item on the
agenda was "... T1s in PPS: Pre-view of production CEs".
As reported previously Andrew Elwell was now running a continuation of
site transfer tests. Recently he observed that we have been too
conservative with the glite-transfer-channel settings for files and
streams. With recent tests involving Glasgow, Edinburgh and RAL rates of
around 600 Mb/s were being maintained.
On Thursday of last week there had been an EGEE SA1 review of UKI
partners. The feedback received was positive (in all but one area). GridPP
was delivering on the main EGEE tasks and additional contributions were
noted. There was interest in the different styles of working found within
the Tier-2s and a desire to leverage more of the work done in
Grid-Ireland. JC agreed to send the appropriate EGEE & Ireland links to
the PMB [done during the meeting].
JC reported that there was a GDB meeting this week at CERN:
http://indico.cern.ch/conferenceDisplay.py?confId=8472.
The agenda had two main topics: EGEE middleware status and the experiment
top 5 issues (as identified in the presentations to the MB over the last
month).
JC noted that with LHCb no longer running any wide challenges (only
stripping at the T1), CPU resources were somewhat idle and generally less
than 45% utilised. In the last month 44% of use had come from CMS and 31%
from ATLAS. LHCb was at 11%. Not much use had been seen from either
camont or totalep so JC would follow up with both. TD noted that there had
been the launch of a new Interdisciplinary Research Centre in Glasgow and
a meeting had been held in Oran Mor with interested groups wishing to set
up VOs (chemists, biologists etc) - it might be possible to get some
additional VOs from this route - at the moment they were being advised to
set up via GridPP VO.
JC reported that no new significant issues had been identified in the Q1
2007 Tier-2 quarterly reports which were now uploaded.
There would be an EGEE operations meeting (similar to that held at Culham
last year) in Stockholm in
June(http://indico.cern.ch/conferenceTimeTable.py?confId=12807)
which has not yet been well advertised. Although this is an EGEE event, JC
asked whether there was any GridPP funding for any sysadmins who may wish
to attend (bearing in mind that most will not attend the WLCG workshop in
September). It was agreed that because this was so close to the January
meeting which all had attended, a strong case would have to be made for
anyone to go to this.
SI-4 LCG Management Board Report
---------------------------------
See https://twiki.cern.ch/twiki/bin/view/LCG/MbMeetingsMinutes
TD reported that high-level milestones for the project now exist - see the
LCG Management Board page for 2007 milestones, of which there are 15. TD
suggested that as a project we should go through these to ensure the
GridPP Tier-1 is meeting the requirements. One of the current issues for
April was 24x7 support definition - of all Tier-1s only four have achieved
this so far. AS reported that up until now, this issue had not been a
priority, but would be over the next three months. JG noted the
limitations of a 'top-down' list. TD noted that out of the three items
due this month, only the 24x7 support definition was not completed - the
issues were not unreasonable to have as a subset list. JG noted that
these would involve monthly reporting. AS would be informing Alberto
Aimar on a monthly basis and is currently working on issues regarding VOs
etc. JG noted that the milestones were being used as a Project Management
tool.
JG reported that the dates for the LCG Quarterly Reporting had changed -
did GridPP wish to align their milestones with these? They were now
reported one month later than formerly. DB noted his reluctance to change
the GridPP reporting deadlines in relation to the financial quarters, as
this would cause problems overall. TD noted that the change of dates
might affect AS's project management, but apart from that it would cause
no difficulties for GridPP generally - GridPP quarter dates would not be
changed.
JG reported on the LHCb top five issues - the GDB will give feedback on
these, and all Experiments, this week and ten-minute presentations were
expected on status and possibilities. The meeting would take place on
Wednesday afternoon via VRVS. JG will be providing an introduction and
overview - he will report back.
JG reported that in the March accounting, only half of the sites had the
correct accounting in APEL for a variety of reasons.
SI-5 Documentation Officer's Report
------------------------------------
Regarding UIG, PC had suggested a phone conference, following an exchange
of emails. It was understood that the problem of the web page for EGEE
documentation lay at NeSC and various people had raised issues with David
Fergusson. The key issue was the website which had continuing timescale
slippage. It was asked who would be present at the meeting, what was the
meeting scope, and what was the GridPP view? TD reported that the meeting
should understand the problems involved and the issues regarding turning
around the documents at UIG. It was noted that Alistair Mills and John
White should be involved. JC, TD, and JG were happy to attend along with
SB. It was agreed that SB would confirm with PC and David Fergusson and
set-up a meeting with those named.
REVIEW OF ACTIONS
=================
247.2 RJ to get further information from ATLAS regarding use of Grid for
testing of PANDA, and report-back. Ongoing.
248.2 DB to take a preliminary look at 2008 onwards, planning to be based
on GridPP3 outcome - to be completed by end May. This was ongoing and in
the context of the CB meeting 2nd week of June, and the Tier-1 Board
meeting 2nd week of June, briefly discussed. The planning should be
firmed-up by the end of May. Ongoing.
250.4 RJ, DN, GP, TD and TC to meet to integrate experiment requirements
and work on Tier-2 networks - sites are aware of requirements but
discussion still has to take place, ongoing when convenient to arrange.
Ongoing.
251.1 TD to raise the issue of memory vs CPU cost at the MB [in order to
work out what the requirement was between 1GB and 2GB memory per core] -
ongoing.
252.2 Re GridPP3 planning, GP, DN and RJ to provide DB with the current
state of planning re the Experiments, quantifying where they need posts
and where the highest priorities lie. This information is required after
the hardware has been sorted out, therefore by end April. It was noted
that no posts exist but resolution to problems can be built into the plan.
GP noted that this had been discussed and a paper will be submitted at the
beginning of May - this was on schedule. RJ needs to discuss issues with
Dario Barberis. Info from RJ was still awaited. Glen would circulate his
information [done during the meeting]. ATLAS and CMS to provide input
over the next week or so. Ongoing.
252.3 RM has now received inputs for his one-page summary - this to go to
DB. [Regarding the transition of each of the existing Middleware areas
from GridPP2 to GridPP2+ to GridPP3 - this needs to be mapped out showing
the things that will be done and the things that won't be done, and any
problems identified]. RM had received input from Dave Colling. Input
from Andrew McNab and Robin Tasker was still awaited. Input was expected
from Paul Millar and Steve Fisher today. RM would follow-up on these
outstanding issues. Input was to be provided to DB this week.
252.4 SP is drafting a document regarding Industrial Sponsorship.
[Regarding the cut to 0.5 FTE for Dissemination in GridPP3, SP noted that
she might be able to suggest ways of dealing with this issue, several
things could be tried to see outcome] - ongoing - the document will be
circulated by SP within the next week or so. SP had sent out an early
draft to SL, DB, and TD, for comment regarding iteration prior to
circulation to the PMB.
253.1 AS has commenced work on the report on data integrity at Tier-1.
[relating to the discussion on implementation of checksums]. This should
be available within a few weeks - ongoing, but was likely to be completed
over the next few weeks.
254.2 ALL PMB members to sign-up to EVO by the next meeting, and the EVO
system would be tested the following week. It was agreed to test the
system at 1.00 pm next Monday, with the PMB following at 1.30 pm. There
was a discussion regarding EVO - advice from the EVO developers was that
the cache should be cleared and Java updated to V6. It was noted that
H323 cameras were a problem and there were remaining firewall issues. It
was agreed to try running ongoing tests this week and move the test to the
next meeting. There was a problem with Java6 upgrade and caching which was
sorted out after the meeting.
254.4 PC to raise the issue of the UIG with David Ferguson. Done - item
closed.
255.1 JC to bring up the issue of a mechanism for informing other sites at
the DTeam meeting - the means of noting loss of jobs would be discussed on
Friday. It was noted that the mechanism was the DTeam meeting - there was
no generic problem about information flowing from Tier-1 to Tier-2. JC
and AS were happy with current procedures. Done, item closed.
255.2 SP would contact the organisers of WLCG Collaboration workshop on
behalf of the 14 people submitted. It was noted that the deadline had
been extended. Done, item closed.
255.3 DK to get approval and report-back with feedback next week. [DK was
seeking approval from groups regarding Grid Site Operations policy. It
was agreed to approve what DK suggests, based on his background knowledge
- issues can be raised through the PMB. Obligations are on the site to
carry forward issues]. The meeting agreed that NG and SL will take
forward this issue.
ACTIONS AS AT 30.04.07
======================
247.2 RJ to get further information from ATLAS regarding use of Grid for
testing of PANDA, and report-back.
248.2 DB to take a preliminary look at 2008 onwards, planning to be based
on GridPP3 outcome - to be completed by end May. This was ongoing and in
the context of the CB meeting 2nd week of June, and the Tier-1 Board
meeting 2nd week of June, briefly discussed. The planning should be
firmed-up by the end of May.
250.4 RJ, DN, GP, TD and TC to meet to integrate experiment requirements
and work on Tier-2 networks - sites are aware of requirements but
discussion still has to take place, ongoing when convenient to arrange.
251.1 TD to raise the issue of memory vs CPU cost at the MB [in order to
work out what the requirement was between 1GB and 2GB memory per core] -
ongoing.
252.2 Re GridPP3 planning, GP, DN and RJ to provide DB with the current
state of planning re the Experiments, quantifying where they need posts
and where the highest priorities lie. This information is required after
the hardware has been sorted out, therefore by end April. It was noted
that no posts exist but resolution to problems can be built into the plan.
GP noted that this had been discussed and a paper will be submitted at the
beginning of May - this was on schedule. RJ needs to discuss issues with
Dario Barberis - ongoing.
252.3 RM has now received inputs for his one-page summary - this to go to
DB. [Regarding the transition of each of the existing Middleware areas
from GridPP2 to GridPP2+ to GridPP3 - this needs to be mapped out showing
the things that will be done and the things that won't be done, and any
problems identified] - ongoing.
252.4 SP is drafting a document regarding Industrial Sponsorship.
[Regarding the cut to 0.5 FTE for Dissemination in GridPP3, SP noted that
she might be able to suggest ways of dealing with this issue, several
things could be tried to see outcome] - ongoing - the document will be
circulated by SP within the next week or so.
253.1 AS has commenced work on the report on data integrity at Tier-1.
[relating to the discussion on implementation of checksums]. This should
be available within a few weeks - ongoing.
254.2 ALL PMB members to sign-up to EVO by the next meeting, and the EVO
system would be tested the following week. It was agreed to test the
system at 1.00 pm next Monday, with the PMB following at 1.30 pm.
Ongoing.
255.3 DK to get approval and report-back with feedback next week. [DK was
seeking approval from groups regarding Grid Site Operations policy. It
was agreed to approve what DK suggests, based on his background knowledge
- issues can be raised through the PMB. Obligations are on the site to
carry forward issues].
256.1 NG to review the draft of the new Grid Security Policy from NGS
perspective, and SL from Tier-2, and report-back.
256.2 RM to contact named individuals who had not submitted papers and
confirm GridPP 50% funding for travel to WLCG/CHEP.
256.3 SP to prepare a response on behalf of GridPP regarding the Horizon
LHC programme.
256.4 AS to provide a version number for the CASTOR upgrade.
[done]
256.5 CPU resources were somewhat idle and not much use had been seen from
either camont or totalep (industrial VOs) so JC would follow up with both.
256.6 SB to confirm meeting details with PC and David Fergusson, and
arrange a meeting (to include JC, JG, and TD) regarding UIG issues.
It was noted that next Monday was a Bank Holiday, therefore the next PMB
would take place on Monday 14 May. Inputs must be received by then, all
GridPP3 inputs were required this week and at the next PMB a formal review
of all areas and inputs would take place. It was noted that LHCb, ALICE,
and CMS hardware issues were required by DB as soon as possible - this
related to 252.2 above and the extent of the hardware refinements at
Tier-1: DB will remind RJ, DN, and GP about what is required. It was
agreed to test EVO prior to the next meeting, from 1-1.30 pm then move to
VRVS. The meeting closed at 2.40 pm.
|