Dear All,
Please find attached the weekly GridPP Project Management
Board Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE GridPP Project Leader
Rm 478, Kelvin Building Telephone: +44-141-330 5899
Dept of Physics and Astronomy Telefax: +44-141-330 5881
University of Glasgow EMail: [log in to unmask]
G12 8QQ, UK Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________
GridPP PMB Minutes 259 - 4th June 2007
======================================
Present: Tony Doyle, Sarah Pearce, Roger Jones, Stephen Burke, David Britton,
David Kelsey, Dave Newbold, Steve Lloyd, Tony Cass, Robin Middleton,
John Gordon, Jeremy Coles, Glenn Patrick, Suzanne Scott (GridPP Admin)
Apologies: Peter Clarke, Andrew Sansum
1. Tier-2 Reviews
==================
SL reported that so far there were few London responses on the website but
information was coming. He had received the Imperial and UCL final
reports and the overview document was completed. NorthGrid was finalised
and feedback received. Overall NorthGrid feedback was to be finalised on
Wednesday. The ScotGrid reports were complete but no feedback had been
received, the overview document was there. For SouthGrid SL had received
drafts of questionnaires but no feedback. Site feedback was missing
generally and there remained a few outstanding questionnaires still to be
returned. RM noted that it was a number one priority to finalise
everything this week. DK would be able to provide initial feedback by
Friday. TD will remind Neil Geddes that information is required by
Friday. It was noted that there was a Tier-2 Board meeting tomorrow.
The Site Readiness Reviews and Responses were being collected together at
https://www.gridpp.ac.uk/tier2/Readiness_Reviews/index.html - at the
moment only the Tier-2 Board and the PMB had access. ScotGrid and
NorthGrid sites were happy that these could be made public.
2. F2F Preparations
====================
DB reported that he still requires middleware feedback and is reviewing
the GridPP2+ plan overall - there are funding issues to discuss. RM
reported that he would provide DB with the required information as soon as
possible following the site reviews being finalised. TD noted that the
Documentation status report was ongoing; the GridSite Security report
would be discussed on Friday. There was no other business, but DB noted
that he intended to shorten the Tier-2 section and lengthen the GridPP3
section. Any other items for the Agenda were welcome.
3. AOCB
========
RM reported that the EGEE III bid had gone in from the UK on Thursday 31st
May. This would be reviewed in two weeks' time.
STANDING ITEMS
==============
SI-1 Dissemination Officer's Report
------------------------------------
SP reported that LHC@home has migrated to QMUL - this went smoothly and
seems to be working well - it will be tested this week. A meeting took
place with CERN and it is hoped to get more consistent work for [log in to unmask]
There was a news item ready on the HEPSYSMAN meeting at RAL, and an item
will be published in International Science Grid next week regarding SL's
ATLAS test jobs.
SI-2 Tier-1 Manager's Report
-----------------------------
On behalf of AS, JC reported the following items:
A) The RAL RB is down today while a refresh of its tables is undertaken.
Problems in this area were indicated by Steve's tests. The delay to
undertake the clean and fix the problem was in an attempt to follow
standard EGEE procedures for announcing downtime. Clearly quicker
intervention may have been appropriate and this will be taken up with
AS when he returns. There was a discussion of UI configuration and
load balancing - the user experience is that the RB is simply not
working, yet it is still accepting jobs. TD noted that UI
configuration should be better documented and that user/site
communication was not good in this area. In the interim, TD noted that
some announcement should be possible, for example to UKHEPGRID, so that
the UK RB problems are more widely known. It was understood that a
discussion between AS and Catalin Condurache was required. Deployment
of the RBs needs to be addressed, as it is going down too often. It
was noted that there is backup at Imperial and a local RB at Glasgow -
all of the regional areas could potentially have their own RB setup,
however it was noted that there should be a move soon to gLite WMS.
The ATLAS Tier-1 had discussed the overload on RBs. It was asked who
currently has expertise to reconfigure RBs? This will have to be
ascertained.
B) CASTOR continued to suffer last week and was down from 31st May 16:20
to 1st June 12:00 "due to meltdown problems". A decision was made last
week to undertake a fresh install of CASTOR as the upgrade to the
existing installation produced problems, and rolling back to the
previous version will not be supported. It was reported that a fresh
install will be done on Tuesday and then needs to be tested. This will
hopefully be completed by Thursday, but this is problematic overall as
the Tier-1 is down. Continuous running is a problem. It was noted
that Bonny Strong was in charge of re-installation. The PMB need to be
kept informed and JG agreed to ask Bonny to provide a report for the
PMB on Friday, if possible providing this by Thursday evening. The
issue of CASTOR would be added to the F2F Agenda.
3) FTS was down on 30th May for essential updates to be installed.
D) The RAL-LCG2 CE has seen problems with high loads leading to several
critical tests (CE-host-cert-valid; CE-sft-job; CE-sft-lcg-rm) failing
affecting availability figures. The cause is being investigated. It
was noted that small files being copied to tape was still a problem.
There was still the possibility of switching-back to dCache. It was
understood that it is less work to separate instances and deal with one
at a time, but this leads to waiting issues. JG will provide an update
on Friday.
SI-3 Production Manager's Report
---------------------------------
1) On the regional VO issue it was decided to follow the <name>.<Tier-2
name>.ac.uk route - scotgrid.ac.uk, southgrid.ac.uk and northgrid.ac.uk
are already registered and London is starting the process. Yves Coppens
has successfully used the latest YAIM to install a DNS-style VO on the
Birmingham PPS and once his method is in the wiki production sites are
likely to move to enable VOs such as ngs.ac.uk.
2) In reviewing the middleware updates at the UKI monthly deployment
meeting last week
(http://indico.cern.ch/conferenceDisplay.py?confId=16755) it became
evident that the lack of stable (automated) updates is leading to sites
falling behind in terms of middleware components deployed. Many/most
sites are still at production release 20 (the current one is 25). It
was established that all releases up to and including release 23 are
deployable. Releases 24+ are problematic. It was understood that
updates ceased when automatic update procedures were turned off.
3) On reviewing the case for experiment testing of new releases using the
testzone, the deployment team believe that it would be better to
strengthen site and experiment engagement in the PPS instead. Use of
and processes on the PPS are currently under review and we should feed
our concerns into this arena. In the meantime, UK sites have been
advised not to split the SGM accounts until issues with this have been
resolved. There was a discussion of the PPS system and release
procedures. It was noted that PPS wikis are available regarding
installation of middleware but nothing was evident in the case of
testing advice for users and user involvement. JC will give
recommendations on PPS testing and a summary of what is currently
available on the system. JC will also forward the chat window location
via email.
4) Temporary permission problems with a CERN LFC last Friday caused most
sites to fail SE and SRM tests.
5) There is a GDB at CERN this week
http://indico.cern.ch/conferenceDisplay.py?confId=8484. The main topics
are middleware updates, monitoring and security policies and issues.
Andrew Elwell will also be presenting on recent GridPP T1-T2 transfer
testing. There is also a pre-GDB meeting on SRM status and plans:
http://indico.cern.ch/conferenceDisplay.py?confId=9807.
SI-4 LCG Management Board Report
---------------------------------
It was noted that JG had now left the meeting, and TD had not attended
last week's MB.
SI-5 Documentation Officer's Report
------------------------------------
SB reported that there had been a UIG phone meeting this week to discuss
the way forward. This had included Ian Bird at the start until he was
called away, and a new person from SA1 (John Shade). Cal Loomis (NA4) has
now taken over as co-ordinator. It was felt that to be able to make
progress, access was required to the documentation to be able to edit and
update it, which was not provided by the MS Access database at NESC which
currently holds it. Cal has therefore set up a version-control (subversion
[1]) database at LAL to which all contributors have access, and imported
the sources to it. The web site itself will still be hosted at NESC. Cal
will now find people to write specific documentation, and to start the
review/editing process. Apparently there is a CERN summer student who will
probably be offered the task of reviewing and validating the
documentation. There may be new personnel at NESC but this is not yet
finalised.
As already circulated
(http://litmaath.home.cern.ch/litmaath/UI-WMS-CE-WN/), Maarten Litmaath
has developed a diagram to show the interactions of various components in
the gLite WMS. This has been passed around to various people for comments.
The intention is that it will appear on a public web page with explanatory
text, at which point it will probably be the best documentation available
for how the WMS works.
A new tool to manage FAQs is being considered by the GGUS group as a
replacement for the current storage in the GOC wiki; at a first look it
seems promising. http://faq.twgrid.org/faq/
[1] gridsite can provide grid-enabled access to subversion, although that
isn't currently being used here.
REVIEW OF ACTIONS
=================
247.2 RJ to get further information from ATLAS regarding use of Grid for
testing of PANDA, and report-back. Ongoing.
248.2 DB to take a preliminary look at 2008 onwards, planning to be based
on GridPP3 outcome - to be completed by end May. This will be completed
tomorrow. Done. Item closed.
250.4 RJ, DN, GP, TD and TC to meet to integrate experiment requirements
and work on Tier-2 networks. Ongoing when convenient to arrange. Not
high priority.
251.1 TD to raise the issue of memory vs CPU cost at the MB [in order to
work out what the requirement was between 1GB and 2GB memory per core].
Ongoing, not high priority.
252.3 RM has now received inputs for his one-page summary regarding the
transition of each of the existing Middleware areas from GridPP2 to
GridPP2+ to GridPP3 - this to go to DB. RM reported that this would be
done by Friday.
253.1 AS has commenced work on the report on data integrity at Tier-1, in
relation to implementation of checksums. Ongoing.
254.2 ALL PMB members have now signed-up to EVO. Tests were ongoing but
this action is on hold due to H323 requirements which must be resolved.
JG/RM will resolve EVO issues. Not a priority at present.
255.3 DK to get approval from groups regarding Grid Site Operations policy
and report-back. This will be presented and discussed at the F2F on
Friday.
256.1 NG to review the draft of the new Grid Security Policy from NGS
perspective, and SL from Tier-2, and report-back. This will be discussed
at the F2F meeting.
257.2 SB to provide a documentation report overview on current status for
the F2F meeting.
258.1 DB to send email to UKHEPGRID informing of internal site review
response deadlines for discussion and feedback. Done, item closed.
258.2 SL to collate all draft documents relating to site reviews onto the
Tier-2 Board Site. It was noted that more information was required, but
this was done. Item closed.
258.3 JC to discuss setting up of pre-existing test VOs for new users; and
discuss appropriate naming - to be added to DTeam agenda for Tuesday.
This will be an Agenda item for the F2F meeting. Done, item closed.
258.4 JC to take the issue of UK target shares and the setting-up of
UK-specific sub VOs to the DTeam, discussing VOMS attribute issues and
timescales. This is some way off yet, and is on hold, further discussions
have to take place. Ongoing.
258.5 SP to ask Pete Gronbech if he will write something for the HEPSYSMAN
meeting. Done, item closed.
258.6 JC to discuss RAL RB issues with Catalin Condurache and bring
conclusions back to the PMB. Preliminary discussions had taken place but
further discussion was needed. Ongoing.
ACTIONS AS AT 04.06.07
===================
247.2 RJ to get further information from ATLAS regarding use of Grid for
testing of PANDA, and report-back.
250.4 RJ, DN, GP, TD and TC to meet to integrate experiment requirements
and work on Tier-2 networks - sites are aware of requirements but
discussion still has to take place. Ongoing when convenient to arrange.
It was noted that this issue is not high priority.
251.1 TD to raise the issue of memory vs CPU cost at the MB [in order to
work out what the requirement was between 1GB and 2GB memory per core].
252.3 RM has now received inputs for his one-page summary regarding the
transition of each of the existing Middleware areas from GridPP2 to
GridPP2+ to GridPP3 - this to go to DB. This will be done by Friday 8th
June.
253.1 AS has commenced work on the report on data integrity at Tier-1, in
relation to implementation of checksums.
254.2 ALL PMB members have now signed-up to EVO. Tests were ongoing but
this action is on hold due to H323 requirements which must be resolved.
JG/RM will resolve EVO issues.
255.3 DK to get approval from groups regarding Grid Site Operations policy
and report-back. Obligations are on the site to carry forward issues.
This will be discussed at the F2F meeting.
256.1 NG to review the draft of the new Grid Security Policy from NGS
perspective, and SL from Tier-2, and report-back. This will be discussed
at the F2F meeting.
257.2 SB to provide a documentation report overview on current status for
the F2F meeting.
258.4 JC to take the issue of UK target shares and the setting-up of
UK-specific sub VOs to the DTeam, discussing VOMS attribute issues and
timescales.
258.6 JC to discuss RAL RB issues with Catalin Condurache and bring
conclusions back to the PMB.
259.1 TD to remind Neil Geddes that there is a deadline of Friday 8th June
for Tier-2 review feedback and reports.
259.2 JG to ask Bonny Strong to provide a report on CASTOR issues &
progress for the PMB F2F on Friday 8th, if possible providing this by
Thursday evening.
259.3 The issue of CASTOR to be added to the F2F Agenda.
259.4 JG to provide an update to the PMB F2F on Friday regarding the
RAL-LCG2 CE and its high loads & test failures.
259.5 JC to provide recommendations to the PMB on PPS testing and a
summary of what is currently available on the system. JC will also
forward the chat window location to the PMB via email.
The next meeting would be the F2F on Friday 8th at QMUL. It would be
decided then whether a meeting on Monday 11th would be required. The
meeting closed at 2.05 pm.
|