Dear All,
Please find attached the F2F and weekly GridPP Project Management
Board Meeting minutes. The latest minutes can be found each week in:
http://www.gridpp.ac.uk/php/pmb/minutes.php?latest
as well as being listed with other minutes at:
http://www.gridpp.ac.uk/php/pmb/minutes.php
F2F minutes can be found directly at:
http://www.gridpp.ac.uk/pmb/minutes/060110.txt
Cheers, Tony
________________________________________________________________________
Tony Doyle, GridPP Project Leader Telephone: +44-141-330 5899
Rm 478, Kelvin Building Telefax: +44-141-330 5881
Dept of Physics and Astronomy EMail: [log in to unmask]
University of Glasgow Web: http://ppewww.ph.gla.ac.uk/~doyle
G12 8QQ, UK Video - IP: 194.36.1.33
________________________________________________________________________
GridPP PMB Minutes 200 - 23rd January 2005
===========================================
Present: John Gordon, Sarah Pearce, Tony Doyle, Dave Britton, Robin
Middleton, Steve Burke, Tony Cass, Steve Lloyd, Roger Jones, Peter Clarke,
Dave Kelsey, Jeremy Coles
Apologies: Dave Newbold
1. Allocation of PMB numbers to documents [TD]
===============================================
It was agreed to change the documentation document from 'User Web Pages'
for the Oversight Committee to 'Documentation Report' and numbered 67.
Others were allocated as below:
* Executive Summary [68] PMB
* Project Map [69] DB
* Resource Report [70]DB
* LCG Report [71] TC
* EGEE Report [72] RM
* Deployment Report [73] DK
* Middleware/Security/Network Report [74] RM
* Applications Report [75] RJ
* User Board Report [76] DN
* Tier-1/A Report inc. Tier-1/A procurement methods [77] JG
* Tier-2 Report inc. Year 1 outturn [78] SL
* Dissemination Report [79] SP
* Documentation Report [67] SB
In addition
* CERN and Tier-2 Operations [83] TC
* Performance Monitoring [82] JC
* (Upper) Middleware Planning [81] RM,RJ
* Experiment engagement questionnaire (v2) [80] DN
* Grid for LHC exploitation [for reference]
It was agreed that the ~final documents would need to be circulated in two
weeks time (Monday 6th February).
Face 2 Face minutes were circulated shortly after the meeting expanding
upon this. The overview would need to be completed by 13th February and
everything would need to be sent before Wednesday 15th February, a week
before the Oversight Committee.
2. Quarterly Reports [DB]
=========================
All the quarterly report would need to be completed now. The deadline had
passed. The status so far was sought. RM had all those for M/S/N. RJ had
half of these for the applications area. Tier 1 Report was not available
and it was agreed that a reminder would be sent advising that the
deadline was past on that they were urgently required. Tier 2 Hardware
report was also not all in and again a reminder would need to be sent.
3. Decisions from Tier-1/A Board [DK]
=====================================
DK reported on the recent meeting.
Agenda item 1 - Tape service plans
----------------------------------
The board agreed ...
that the proposed move to Castor is the right approach, while agreeing
that the timescales are tight and that this does give rise to some risk.
we will buy just one T10K drive now for testing (its more cost effective
to delay purchase of drives until they are really needed)
we will not buy T10k tapes now apart from a small amount for testing and
service challenge needs (as they are likely to be much cheaper next
year)
we will not purchase any more 9940 drives or tapes
RAS (with reference to the UB) is authorised to purchase up to 100 TB of
T10K tape. T10K drives needed only for service challenges will be
borrowed. Up to 2 additional T10k drives in the second half of FY 06/07
(if these are needed by the experiments) would be approved following
review by Tier-1 Board at the October meeting.
The ongoing maintenance and operations costs of the DataStore after the
end of GridPP2 will be covered as part of the bid for the Tier 1/A
service to PPARC
The UB will in future consider tape bandwidth requirements as well as
capacity
Agenda item 2 - 2005 Outturn
----------------------------
The board was happy to see that the job slots had been very nearly full
since August 2005 and agreed that there was little room for improvement
here
Agenda item 3 - Tier 1/A Requirements, Planning, Allocations and MoU's
----------------------------------------------------------------------
The board discussed the various issues at length.
There is very little flexibility because: Disks are more expensive this
year than last year's planning figures (foreseen price drops from the
availability of higher capacity drives have not arrived soon enough).
CPU is more expensive than planned. There was uncertainty of the
performance of dual core processors. (Note: New information obtained
after the meeting has resulted in the use of a factor of 2 for AMD
Opteron processor performance, dual vs single core, for planning
purposes.) The FY 05/06 purchase was delayed by PPARC following the July
05 OC meeting, meaning that the increased disk capacity will not be
deployed until Q3 2006.
Planning in April 2005 concluded that there would be a severe lack of
resources definitely in 2008 and probably in 2007. The problems noted
above have brought this crisis forward to 2006. There are just not
enough funds to meet all of the requirements. Difficult decisions will
have to be made both by the running experiments and LHC.
Given this very difficult situation, the board agreed...
BaBar should manage within its current disk allocation of 95 TB for the
first two quarters of 2006. Others will therefore be squeezed by ~20 TB
compared to the current UB allocations. The UB will have to reconsider
the LHC and other experiments disk allocations for these two quarters.
No decisions are made at this point regarding allocations for Q3 and Q4
of 2006. These will need to be looked at again following agreement of
the purchasing plan for the next purchase (early in FY 06/07)
The board also noted the large amount of disk available and foreseen at
the Tier 2 centres and encourages all experiments to make use of these.
AOCB
----
The 2006 figures are urgently needed for the signing of the LCG MoU.
PPARC have delayed signing until after this board meeting.
DB will update (and circulate) the planning figures for 2006 on the
basis of the latest information within the next few days.
4. Year 1 Outturn Spreadsheet Development [SL]
=============================================
SL had prepared spreadsheets on the GridPP Resources used by LCG in 2005.
The data included Tier-2 and Tier-1 CPU Use by all Experiments adopting LCG.
It was noted that:
The available KSI2K are taken from the Tier-2 Quarterly Reports
(LCG Resources);
The used KSI2K Hours are taken from the GOC (GridPP Accounting);
Use at Cambridge is not recorded because they use Condor not PBS;
Disk Data are taken from the GridPP Disk status webpage;
The allocated KSI2K are taken from the MoU numbers.
SL expressed doubts about the efficiency of the Accounting systems
currently used. They wouldn't give absolute numbers but would indicate
the trend over the year.
NEW ACTION 200.1: JC to summarise the accounting problems.
It was suggested that he may start with discussions with Dave Kant and
Greig Cowan to enquire if they knew of the sources of problems.
5. Additional GridPP meeting in November 2006 [PC]
=================================================
After discussions it was agreed that a meeting would be held in late
October, the dates being looked into were 30th and 31st October and 1st
November 2006. It was to be held at NeSC. PC would start looking into
room availability.
Note: dates now fixed at
http://www.gridpp.ac.uk/meetings/
17th GridPP Collaboration Meeting, NeSC, 1-2 November 2006
(with PMB meeting prior to this)
6. EGO Questionnaire [RM]
=========================
NG is collating a UK/I response to the questions posed by EGEE concerning
what follows EGEE phase 2. At present this goes under the heading of a
European Grid Organisation - EGO. There is an EGEE PMB workshop on EGO
next week near CERN and the inputs from the various federations will be
summarised there.
GridPP inputs to this should be sent to NG and RM before noon tomorrow.
7. New Travel Guidelines and Forms [RM]
=======================================
There is a new travel procedure document which is being adhered to.
See
http://www.particlephysics.ac.uk/research/travel-and-claim-forms.html
for the new forms.
STANDING ITEMS
==============
S1-1 ALL to provide any news items and or dates confirmed or provisional of
conferences and or meetings to SP.
DK noted that the next Tier 1/A meeting was scheduled for 11th May 2006.
A PMB Face 2 Face was proposed for 12th May (or, possibly, 10th May).
There should be a press release shortly on Service Challenge 3.
It was hoped that the signing of LCG MoU could also be a news item.
S1-2 Production Manager's Weekly Report of Issues
JC reported that:
1) The T0-T1 throughput tests have been ongoing for the last week. RAL
has successfully participated. Our rate has been Fair and typically
averaged around 100 MB/s. The target of 150 MB/s has been difficult to
achieve with bandwidth capacity being limited at CERN. As a result this
week there will be a further test to gain an indication of individual
institute peak sustained rates. Later in the week the use of srm-copy
will be possible and may result in an improvement of throughput rates.
2) The weekly EGEE report which can now take input from sysadmins now
has a more reasonable editing period and for this weeks report the
response was good. Sites which responded were: Grid Ireland (for all
sites!), LeSC, Bristol, Glasgow, Edinburgh, Brunel, Durham, Liverpool,
Oxford, RHUL, RAL, UCL-CCC, Lancaster, IC and QMUL. No information was
(or has in the past been) entered for the following sites which had
problems during the period: UCL-HEP, Sheffield and Cambridge. Birmingham
usually responds but not this week and Manchester was down for
maintenance.
3) The main focus of the deployment team last week was on the testing of
a pre-release of 2.7.0. Some minor problems were found and fed back, but
there were no major concerns. The hope is for a 2.7.0 release next week.
The requested upgrade period will be 3 weeks as before.
4) Birmingham has been leading our efforts to be a stable part of the
Pre-production Service thanks to the efforts of Yves Coppens. Imperial
is also now joining. GLite 1.5 was released at the weekend.
5) The Helpdesk upgrade mentioned at GridPP15 has gone ahead today. This
will enable more automated ticket exchanges with GGUS and other ROCs,
and also the closing of ticket copies in other helpdesks.
6) The deployment team will send one person to the forthcoming Ticket
Process Management course at CERN at the beginning of February. As a ROC
it is expected that we take part in this EGEE wide activity.
7) Internal GridPP transfer tests have been ongoing. More sites are now
hitting the target rate of 300 Mb/s but we have had to limit tests with
the Tier-1 due to the SC3 throughput test reruns. Generally sites have
been good at getting involved. The latest results and scheduled tests
can be seen in the Wiki:
http://wiki.gridpp.ac.uk/wiki/Service_Challenge_Transfer_Tests
S1 -3 Management Board Report of Issues
JG to report next week
S1 - 4 Documentation Officer's Weekly Report of Issues
SB noted no issues this week.
It was agreed that if Sarah and Dave could log on VRVS.org as a test and
it worked then TD would set up for next week's meeting.
Next meeting 30th January at 1.00pm
REVIEW OF ACTIONS
=================
184.1: TC to write document "How CERN helps the small sites to install
and manage the LCG software".
- ongoing
184.3: RM and RJ to document Gap Analysis
- ongoing
184.4: DN to document UB questionnaire issues
- ongoing
187.1: JG to prepare combined actions list from GridPP14 meeting
- done
197.1: SL to determine realistic estimate of July 2006 hardware for T2s.
- ongoing
197.2: SL to review TC's document on "How CERN helps the small sites to
install and manage the LCG software"
- ongoing
ACTIONS AS AT 23RD JANUARY 2006
===============================
184.1: TC to write document "How CERN helps the small sites to install
and manage the LCG software".
184.3: RM and RJ to document Gap Analysis
184.4: DN to document UB questionnaire issues
197.1: SL to determine realistic estimate of July 2006 hardware for T2s.
197.2: SL to review TC's document on "How CERN helps the small sites to
install and manage the LCG software"
200.1: JC to summarise the accounting problems.
GridPP PMB Minutes 199 - 10th January 2006
==========================================
Face to Face Meeting at RAL
----------------------------
Present: Tony Cass, Pete Clarke, Dave Newbold, John Gordon, Roger Jones,
Robin Middleton, Tony Doyle, Dave Kelsey, Stephen Burke, Neil Geddes,
Dave Britton, Jeremy Coles, Steve Lloyd. By Gizmo/Phone: Sarah Pearce.
Apologies: Deborah Miller.
Experiment's Hardware Requirements
==================================
[OC Action 1 - GridPP to go back to the experiments to confirm their
requirements before the next tender exercise. (minute 5.7)]
DN reported that the issue was the lack of Tier-1 Disk Resources and the
usage of Tier-2 Disk Resources. DB noted that we had been back to the
experiments and the requirement is now 50% less. DN explained that the
new UB numbers are a pragmatic response to the shortfall and not a change
in the requirements. However, we cannot meet the MoU commitments to LHC
or BaBar due to lack of resources. Should we bring spend on disk forward?
DN said his personal opinion was no because we will always have this
problem so bang per buck is more important. We have now gone from under
use to overuse. We should concentrate on 2007. We still need to go back
to the experiments for their strategic requirements.
On the issue of Tier-2 disks the feeling from the experiments is that
they are not yet reliable enough to use. We need to demonstrate that they
are reliable enough. They do not need tape backup. It needs the Tier-2s
to agree to provide more robust disk storage and the experiments to
provide more detailed requirements. There was agreement that Durable
means long term but not for ever. However this can't be implemented at
the moment. Is there a practical way of using Tier-2 disk in the UK? This
is put to the Deployment Board how do we use Tier-2 disk? Probably have
to start with particular sites per experiment (H1 have successfully done
this and are now expanding). It was agreed to start with ATLAS, CMS and
PhenoGrid at Lancaster, Edinburgh and Imperial, gain experience and
confidence and then expand.
Action 199.1: JC to raise Tier-2 disk usage at Deployment Board.
Oversubscription of Resources
=============================
[OC Action 5 - GridPP to consider introducing a suitable level of over
subscription of GridPP resources (minute 6.1)]
DN explained that CPU is over allocated in the sense of what the
experiments are told they can have but is rather meaningless on a Grid.
Over allocation can't be done for disk and experiments don't want it.
Historically experiments asked for a lot of disk and didnt use it. At
Tier-1 there are sufficient disk servers to allocate each server to a
single experiment and it is a lot easier not to share across experiments,
which means there is only one set of people to negotiate with if it is
lost. It is not so obvious how to do this at Tier-2.
Experiment Engagement Questionnaire Plans
=========================================
[OC Action 4 - GridPP to provide an update of the Experiment Engagement
Questionnaire for the next meeting (minute 5.19)]
[OC Action 10 - GridPP to review the information gathered by the
Experiment Engagement Questionnaire and consider the actions required to
make the outcome of future questionnaires more positive (minute 6.6)]
DN explained that there have been two such questionnaires, at the start
of 2005 and the middle of 2005 with similar results They show a clear
lack of engagement. DN has gone round after the UB talking to individuals
to see how things have gone. For the small experiments (not H1/ZEUS)
there is no change. These should probably be portal users. For H1/ZEUS
there is a much better uptake. They are major users of Tier-1 but are not
using data management tools which needs to be investigated. Mainstream
BaBar are not engaged with the Grid. Resources have been allocated but
there is not much progress. For LHC things have incrementally increased
and are improving. Documentation is being addressed. Data management is
the issue. Workload management is OK. Addressing other issues:
documentation and Tier-1 contacts are being addressed and improving but
we are not seeing the full fruits yet.
There was a discussion of whether to continue with questionnaires which
are not viewed so useful as talking to people. Some of the issues have
been addressed, some not e.g. portals for small experiments. They have no
manpower to develop them. UKQCD are using their own tools which works
well but is not transferable. We can make a list of actions for the small
experiments but it needs manpower that no-one has. The real problem is
BaBar. What can we do about BaBar not using the Grid? BaBar have been
asked what they will do with Grid in the coming year. For LHC one can
define a list of actions but not clear how to support them e.g. a request
for more experiment support at the Tier-1. The contacts list didnt happen
although the people are defined and being effective but overloaded. We
cannot operate the Tier-1 in 2007 with this level of manpower. There was
a discussion of middleware versus experiment support. There will be a
list of issues in a couple of weeks.
Gap Analysis
============
[OC Action 9 - GridPP to undertake a gap analysis of baseline services
needed by the experiments (minute 6.5)]
RJ/RM reported. A document exists. An issue is whether there is going to
be any fallout from this analysis? The answer is probably yes. We are
putting our faith in the LCG and experiments VO boxes. This may not
satisfy committee. More effort is needed in upper middleware and
operations for the experiments. If there are gaps what can GridPP do
about it? We need to put pressure on someone to fill gap. VO boxes
currently provide ad hoc solutions. These are pro temp solutions and
should become part of middleware stack. There was a discussion of data
management scenarios. Whats the gap and how are we going to fill it? Much
is being done at CERN. What does GridPP do if there is no solution? It is
clearly not just a UK problem. The POB recognises the problems. Service
challenges are addressing some of the issues although engagement of the
experiments in the UK is maybe not high enough (it was good to start
with). Sites have been well engaged. There are problems when milestones
slip. Service challenges dont really address the gaps. The RJ/RM document
answers some of the questions about high level data management services.
We should recognise some things arent going to be provided and
experiments will have to do them. This is no problem at end of the day as
experiments will do it but it duplicates effort. There are not thought to
be any large gaps here (showstoppers) for the major clients but some
things need more effort and have to be developed by each experiment. Once
again it was asked why don't BaBar use them and the conclusion was that
they don't really need them.
Upper Middleware Planning
=========================
This agenda item referred to the "additional" OC document in the proposed
list of documents but it was agreed that this is the Gap analysis
document discussed earlier. There was a discussion of the forthcoming
Rolling Grant and Tier-1/2 call. We need to flag to the OC that PPARC
needs to define this soon.
Value Added
===========
[OC Action 11 - GridPP to identify the top 5/6 added value items that
GridPP had delivered (minute 6.7)]
DB reported that we had 20 plus items at Birmingham and it was agreed
that we need to pick 5/6 highlights. GridPP has added value as opposed to
giving money piecemeal to experiments. TC summarised it as follows: We
created a strong GridPP identity (picking up PPARC e-Science lead) which,
together with founding contribution to LCG Project at CERN produced clear
UK leadership in critical middleware areas, notably Grid security (and
Information & Monitoring systems). Within the UK, the strong GridPP
identity led to well organised and coordinated Tier-1/Tier-2 structure
and thus the UK's largest Grid which is emphasising UK's contribution to
computing for LHC experiments (e.g. plot with UK contribution to LHCb &
CMS MC production) and is open to other sciences in the UK."
Tier-1 Issues ============= TD reported that we hope to know in the next
day or so the current procurement costs of disk and CPU to meet the
minimum ~200KSI2K and with the rest spent on disk. BaBar have been
requested to make case Tier-1/a Board as to how to meet their
requirements of 90TB by a realistic date. Can they use Tape at Tier-1 or
disk at Tier-2. If they cannot get more Tier-1 disk, their plan is to
migrate data currently stored here to Italy and this might have physics
impact as it is used predominately. The estimated cost of 90TB is 150k.
It was asked whether the PMB would sign off 150k for BaBar's use if it
were part of plan to use Grid. Otherwise the money would be spent in
2007. The BaBar MoU doesnt specify when in the year it is delivered but
BaBar UK want to specify. TD said there is 1.5mpounds in the 2007 budget.
This represents movement forward of 10%. The 90TB now would be 250TB in
2007 which is about the same as CMSs allocation. There was a loooong
discussion. The conclusion was that the PMB did not wish to buy 90TB of
disk for BaBar now but if they had to then it should come out of BaBar's
2007 allocation. TD would report this input to the Tier-1/A Board.
Dissemination Items
===================
SP reported. She has been commenting on the draft strategy from Mike
Greens LHC Promotion Strategy Group and offering GridPP help. We were
turned down again by the Royal Society, this time jointly with AstroGrid.
The LHC Group has also submitted a bid and we may be able to piggyback on
that if they are successful. SP has been preparing for Mumbai i.e.
screens and a stand. RM asked which flags we are flying, SP said all. She
is trying to find out if there is space for posters, flyers etc. SP said
it was time to kick start the GridPP Brochure project. TD said this is
tied to the added value discussed earlier. SP reported that the new
Events Officer, Neasan ONeill started on Monday. He will first look at
introductory stuff on the website. and speak to Fergus and QM designers
about magic cubes. He is also taking pictures. SP suggested it was time
to change the posters template maybe for the IoP meeting in April. This
was agreed by the PMB. Neason needs to talk to designers at QM. RM was
volunteered to provide a GridPP15 news item while out of the room. SL
asked about the Birmingham schools project. SP said this is not yet set
up but Pete Watkins will keep us informed.
OC Document Preparation
=======================
TD will allocate PMB numbers. The OC need the documents a week before the
meeting (22 February). Hence we need input by 6 February to sign off by
13 February. SL reported that he updated the web pages but not yet with
all the additional documents.
The proposed documents (as of 8 December 2005) are:
* Executive Summary [PMB]
* Project Map [DB] added value items
* Resource Report [DB]
* LCG Report [TC]
* EGEE Report [RM]
* Deployment Report [DK]
* Middleware/Security/Network Report [RM]
* Applications Report [RJ]
* User Board Report [DN]
* Tier-1/A Report inc. Tier-1/A procurement methods [JG]
* Tier-2 Report [SL] inc. Year 1 outturn
* Dissemination Report [SP]
* Documentation Report [SB]
In addition
* CERN and Tier-2 Operations [TC]
* Performance Monitoring [JC]
* (Upper) Middleware Planning [RM, RJ]
* Experiment engagement questionnaire (v2) [DN]
* Grid for LHC exploitation [for reference]
ACTIONS AS AT 10TH JANUARY 2006
===============================
184.1: TC to write document "How CERN helps the small sites to install
and manage the LCG software".
- ongoing
184.3: RM and RJ to document Gap Analysis
- ongoing
184.4: DN to document UB questionnaire issues
- ongoing
187.1: JG to prepare combined actions list from GridPP14 meeting
- ongoing
197.1: SL to determine realistic estimate of July 2006 hardware for T2s.
- ongoing
197.2: SL to review TC's document on "How CERN helps the small sites to
install and manage the LCG software"
- ongoing
Action 199.1: JC to raise Tier-2 disk usage at Deployment Board.
|