Dear All,
Please find attached the GridPP Project Management
Board Meeting minutes from the last face to face meeting.
The F2F minutes are highlighted in bold at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
Cheers, Tony
________________________________________________________________________
Tony Doyle, GridPP Project Leader Telephone: +44-141-330 5899
Rm 478, Kelvin Building Telefax: +44-141-330 5881
Dept of Physics and Astronomy EMail: [log in to unmask]
University of Glasgow Web: http://ppewww.ph.gla.ac.uk/~doyle
G12 8QQ, UK Video - IP: 194.36.1.32
________________________________________________________________________
GridPP PMB Minutes 184 - 5th September 2005
===========================================
Face to Face Meeting at Birmingham
----------------------------------
Present: Dave Newbold, Jeremy Coles, John Gordon, Roger Jones, Robin
Middleton, Tony Doyle, Dave Kelsey, Stephen Burke, Dave Britton, Deborah
Miller, Steve Lloyd.
By phone for part: Tony Cass, Sarah Pearce.
Apologies: Neil Geddes, Pete Clarke.
DB Explained that the focus of the meeting was on items/actions from the
last OC. He welcomed Dave Newbold (UB Chair) and Stephen Burke
(Documentation Officer) to the PMB.
Tier-1
======
TD reported that there was supposed to be 216k for Tier-1 hardware this
(financial) year. The UB met at Durham to discuss the likely shortfall in
Q4. This was also discussed at the Tier-1 Board. There was then a request
by Ken to Guy (July) for an additional 300k to be moved forward from next
year to meet half the UB request. DM commented that the OC needed to be
reassured on low CPU usage. There is a report from Andrew Sansum on Tier-1
utilisation. Accessing remote databases means it is difficult to utilise
100% cpu. Overall efficiency dipped to 50% in Apr/May. In Jul/Aug it was
back above 90%. The throughput was maximum in Jul/Aug. Of 187TB disk 150TB
has been used and 10% overhead is needed. It takes 5 months to procure - a
problem if we don't set up procurement this FY which will most effect
BaBar.
There has been no feedback from the OC yet. We need to document procurement
procedures for the next OC. Disks look like they will be 20% cheaper.
Need to look at total cost of ownership. DN pointed out that cheapest is not
necessarily better. Any delay will affect BaBar much more than LHC. Service
Challenges are contained in the requests from the LHC Experiments and are
not likely to be badly effected.
It was agreed that we have fulfilled our actions and are awaiting feedback
from OC and PPARC and will have to deal with the consequences.
JG commented that tape bandwidth was now realised to be a problem. We
need to put a pointer in the MoU that it's unquantified. Money may need to go
into tape bandwidth and other things will have to go down. This is too much
of a detail for the exploitation review and will be addressed by the Tier-1
Board.
Documentation
=============
TD said the role of a Documentation Officer was circulated in July.
There were no comments. Steve Burke has accepted the position, initially for
9 months. SB said we had to be realistic. The highest priority is to provide
necessary documentation for users. There was a discussion of whether SB
actually writes the documentation or coordinates others. TD said the latter.
DN pointed out that there is already much documentation. TD said it needed to
be reviewed to identify gaps. The problem up till now has been that no-one
has full time responsibility for documentation. SB asked who decided
priorities. The answer was to make a proposal and bring it to the PMB who
could refer to the UB to take to the users.
The OC had asked for additional resources to be put into documentation so
this action was deemed done. Some junior appointment will be made at
CCLRC. We have to work out with DK what the plan is. SB asked about web site
design. We have a website (and Wiki). There was a discussion on style and
content. SB has the freedom to do what he likes within the documentation
links. It has to be got into a state where it can be taken over by someone
with no experience. It was agreed it should be self contained within GridPP
and others can pick it up if they want. TD commented that the DESY site is
good. SB can judge.
RM asked about the impact on LCG deployment deliverables. This led to a
discussion as to whether a logbook was necessary for documentation. It was
agreed not - 'the proof of the pudding ...'. TD explained why SB had been
appointed for 9 months. 9 months seemed appropriate to judge what is
needed.
Tier-2
======
SL presented a comparison of the Tier-2 hardware resources declared in
the Q2 QRs with the MoU commitments for September 2004. SouthGrid have met
their commitments. Overall the CPU situation is not too bad but the disk
resources are very low. It was agreed that the results from the next
quarter would be reported to PPARC as the out-turn from the first year of
GridPP2. It was agreed that this should be pointed out (via the CB) in
particular to Liverpool who were understood to be close to delivering
resources to the Grid.
Deployment Model
================
JC presented slides outlining the LCG Baseline Services. TD said
that the OC asked about what is provided for smaller sites to allow them to
contribute to LCG. SL asked who is supporting DPM? JG said EGEE SA1. GridPP
needs to do something if it's not part of the release. DB pointed out that
it is pretty hard for small Tier-2 sites to keep up. It is difficult
especially when things do not keep to schedule. TC said YAIM is supposed
to do this and people seem generally happy with it. It was agreed that
there is a schedule
problem however the time to deploy is apparently independent of anything
at the moment. It was asked if YAIM is totally wrong direction? There is a
JRA1 tool as well - which YAIM should look at and incorporate.
TD asked TC to write a document - How CERN helps the small sites. TC asked
how we move forward. He said that the quarterly release cycle had no impact.
TD said it had impact in the UK but not elsewhere. SB said we couldn't
actually do it now. Sites won't agree to 3 weeks if there are no scheduled
releases. TC said sites should collaborate so it doesn't depend on single
people being there. TC agreed to draft a document. GridPP should react
against this to see if it's practical or not. JG said that the middleware
groups need to be aware of configuration issues. SB stressed the need for
a pre-production system.JG pointed out that there is now a test-zone in UK.
DB said it was clear we need document for OC (2nd half of Feb). JC said
perhaps it would be better to have smaller updates more frequently.
NEW ACTION 184.1: TC to write document "How CERN helps the small sites"
Dissemination
=============
SP Reported. All Hands Meeting:
Bekki will send out an email reminding people that she can print their
posters if they get them to her within the deadline. Focus sessions have
been arranged for the stand, and flyers and design for the stand have
been agreed. The PMB discussed whether there should be a rota for people
to help on the stand. It was agreed that this would probably stop the
stand becoming too crowded with GridPP people, and Bekki would arrange a
rota. GridPP rugby shirts have been ordered. There would be two press
releases from GridPP: on SC3 and GridSiteWiki.
There was discussion of the success rate of AHM abstracts. Dave Newbold
would set out some of the factors which contribute to a successful
abstract, including not being too jargon filled and pointing out the
generic implications for e-science.
NEW ACTION 184.2: DN to report on GridPP All Hands abstracts
The JPhysG paper was resubmitted on 8 August. No response had been
received yet from the referee.
Dissemination Awards.
RAL - The PMB weren't convinced that the proposed light badges were the
best freebie for the target age group. SP would ask Fergus to examine
the costs of cubes - any quote should be within the 2000 for the
dissemination awards.
Birmingham - this was agreed. GridPP would like copies of the students'
powerpoint presentations.
Imperial College - it was agreed that the PMB would support the proposed
Gridcafe, and any costs would be met from the dissemination budget.
SP reported that she is working on an EGEE-2 project with NG and Francois
Grey (CERN) on dissemination and communication. QMUL are bidding for one FTE
to work on an electronic newsletter.
SC05
There would be two talks proposed for the SC05 stand - Operations and
deployment, and Applications and portals.
Gap Analysis
============
Initial discussion about what exactly were we looking for gaps within?
LCG is not suppose to be everything to everyone and different users were
expected to dip into it at different levels. Nevertheless, it would be
useful to examine requirements/expectations and see if something obvious
was missing, though it was noted that where things ARE missing it may be
too late to provide generic solutions as the LHC experiments will have
worked around problem by the time anything is done. For
example, is automatic data replication a gap or not? RJ has an
analysis from Ian Bird. Other things are data pinning off backup for
reprocessing (pre-stage). We needs VOMS to control access to everything. It
needs more than 2 attributes. A high level list exists. Drill down is
detail.
What are the long term gaps? We need to be careful taking on responsibility
for filling all the gaps. It was a agreed we need a 6 page document with
2 of them at high level.
NEW ACTION 184.3: RM and RJ to document Gap Analysis
UB Questionnaire
================
DN reported that he hadn't seen the detail yet. Dan Tovey's summary is fair.
It is hard to see what to do. The situation for LHC and running experiments
are quite different. There are no show stoppers with LHC experiments. The
issues are documentation and support. The perceived unreliability of the
system is a real unreliability. GridPP could help smooth the interactions
between the experiments and LCG. There needs to be intensive support during
data challenges. It would be nice if GridPP support people focus on them.TD
pointed out that 2 people are now at the Tier-1. The generic services are
handled well and always have been. For the experiment related things we need
an upgrade strategy. We need a system that works while it is being upgraded.
There is a lack of support when things go wrong and knowledge of who to
contact. There is now a single (GridPP) page for the support system. GGUS is
supposed to be this. Up to now generally one person has been doing
production for each experiment but we are on the brink of having many users
trying to use the Grid for analysis. There is also the role of the experiments
e-Science support people to consider. We need a clearly defined support
structure. This should be one stop but needs publicity and needs to work.
For the non LHC experiments there is a major image problem. We need to
demonstrate added valued. Older experiments may never come on board
(this was disputed). The incentive becomes less and less as end of data
taking comes. It is hard to believe that BaBar would move their data.
There shouldn't be ring fenced resources but a single pool. Four years
of UK and other effort has been working on Gridification. There is a
better impression from newer collaborations (MICE, phenogrid etc). We
should focus our evangelism and concentrate effort there and new
initiatives. CDF voted with their feet and (Oxford) effort moved to ATLAS.
It was concluded that we need to direct people through support centre. JG
wants the helpdesk to define who deals with which query. Is more formal user
training needed? It was pointed out that NeSC is doing this 'all over
Europe'. Maybe it's not well enough publicised? Perhaps we should give
tailored courses. They probably need to be experiment specific. There
are C++ training courses also going on but not publicised.
DB said we need to write down outcomes in point form and what we are
doing to address each point. Perhaps we need to do another questionnaire
in January.
NEW ACTION 184.4: DN to document UB questionnaire issues
High Level Value
================
DB presented a high level view of the value added by GridPP. It is
mainly for politicians. The problem is that some activities had much more
manpower than others. How far to drill down? We need backup information to
justify high level points and avoid overused phrases. This was on the
agenda for the next day.
Deployment
==========
There was a discussion of the 14 problem metrics flagged to the OC (JC had a
different set in mind).
* Responsiveness of developers. Some security bugs fixed others not. RAL RB
not stable is fixed. Others being logged better. Specific bugs are being
fixed but not all. There are 248 open bugs with the earliest Feb 2004. LCG
should pass them on to the developers. There is no framework to solve these.
* Job slots. 3070 available (Target 3000) of which 2800 are at 'OK'
sites. OK
Job slots used fluctuates. In August it was 80% now down to 30% again.
* KSI2K available not yet met.
* Disk available not yet met.
* Tape though grid improving. CMS using it. It needs to be in the Tier-1
report.
* KSI2K used should have gone up.
* Disk used no change.
* Tape used no change.
* Disaster recovery plan no change.
* Downtime definitely better.
The rest were not Deployment.
It was reported that 15 sites have deployed 2.6, the others are on scheduled
maintenance.
Quarterly Reports
=================
DB reported that for Q2 not all the Middleware (WMS), and Applications (CMS,
CDF) are in yet. RM reported that he has the WMS one. It is a problem forDB
if he doesn't get the reports. People asked what is the sanction? DB asked
what he can do to help.
For CDF, RJ is still expecting deliverables. There have been 4 quarters
without any reporting of deliverables. He lasted asked Stefan at the
beginning of August.
NEW ACTION 184.5: RJ to contact Todd about CDF deliverables
DB reported that there are many other minor problems on the QRs and pointed
out that the next set are due soon!
AOB
===
DK raised the issue of the Vulnerability Group Policy. LCG are discussing it
at the GDB this week. DK explained why it needs management approval. Some
Sysadmins are unhappy. Normal practice is to go public after 45 days if
fixed or not. We are more closed community. We won't go public but anyone
from GridPP/LCG/EGEE can be part of the group. Sysadmins don't want to join
in case they find out things about the software which compromises them if
they deploy it. After some discussion, the Policy was approved by the PMB.
JC reported that RAL hasn't managed to rejoin SC3. There is a problem with
the network. There is no UKLight to RAL. There is a worry of whether it will
work in future for SJ5? It has been down for more than a month now. DB
pointed out GridPP funds a person to work on UKLight.
TD presented a preview of his Tier-1/2 planning slides. DB wants a look to
see if the numbers look ludicrous. It was asked why not underpledge? It was
pointed out that others would claim they're bailing out the UK and also that
CMS already find it hard to be a Tier-1 in the UK at the current level.
The delivery of Tier-2 CPU is ~OK but the disk depends on Manchester.
JG reported that the Tier-1 is deploying a VO box. This will inform Tier-2s
involved in SCs.
ACTIONS AS AT 5th SEPTEMBER 2005
================================
183.1: RJ to circulate the URL on Tier-1A Utilisation to the UB (in the
absence of Dave Newbold)
183.2: TD to discuss the lack of CMS response to 05Q2 with Peter Hobson
183.3: RM to conclude discussions on Storage Logbook with SF and then
submit to DB
184.1: TC to write document "How CERN helps the small sites"
184.2: DN to report on GridPP All Hands abstracts
184.3: RM and RJ to document Gap Analysis
184.4: DN to document UB questionnaire issues
184.5: RJ to contact Todd about CDF deliverables
|