JISCMail - UKHEPGRID Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
UKHEPGRID Archives

UKHEPGRID@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		UKHEPGRID Home
		UKHEPGRID January 2008
Options

Subscribe or Unsubscribe
Get Password
Subject:
Minutes of the 287th GridPP PMB meeting
From:
Tony Doyle <[log in to unmask]>
Reply-To:
Tony Doyle <[log in to unmask]>
Date:
Wed, 16 Jan 2008 14:31:25 +0000
Content-Type:
MULTIPART/MIXED
Parts/Attachments:
TEXT/PLAIN (20 lines) , 080114.txt (1 lines)
Dear All,

     Please find attached the latest weekly GridPP Project Management 
Board Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


               GridPP PMB Minutes 287 - 14th January 2008

               ==========================================

Present: Tony Doyle, Sarah Pearce, Roger Jones, David Britton, Steve Lloyd, 

Robin Middleton, John Gordon, Jeremy Coles, Peter Clarke, Glenn Patrick, 

Andrew Sansum, Dave Colling, Suzanne Scott (Minutes)



Apologies:  Stephen Burke, David Kelsey, Tony Cass, Neil Geddes



1.  ALICE priority

==================

AS reported that at present Alice have zero disk allocation and have not 

yet had their CASTOR disk space set up.  In order to take part in 

February's CCRC they (Alice central rather than UK) have requested 1.1 TB 

but it was likely that the requirement would be at least 1 disk server, 

which implies 4-6TB depending on exactly what can be made free. When we 

get the disk space we have to install the xrootd interfaces. It is 

probably not much work to install xrootd but if it gives any problems it 

will be in competition with higher priority work for ATLAS/CMS and LHCb in 

prep for CCRC08. Setting up the Alice CASTOR endpoints (on our shared 

server) is less than half-a-day's effort.  It was noted that if this work 

does not start early next week there will be no chance of getting Alice 

ready for CCRC08. Even if the effort is invested next week, the chances of 

success are not great given the untried interfaces (at RAL), lack of 

priority, and time to resolve problems. On Tuesday the MB will require to 

know how we stand WRT the endpoint setup for all 4 experiments.



How does the PMB wish to proceed? The disk space issue was discussed 

before but now our position of zero allocation will become very clear to 

the WLCG and is inconsistent with our MoU commitments.  In the event that 

they want us to proceed, how should we prioritise Alice WRT the other LHC 

experiments and even Minos and Babar over the next 6 weeks or so?



GP noted that the problem was lack of input from Alice, and the fact that 

their disk allocation had been used elsewhere due to lack of uptake and 

lack of engagement.  GP had been given a technical contact at CERN but the 

Alice request (which GP had estimated) had not been confirmed.  GP advised 

that minimal storage would be fine, but the priority would need to be set 

at 'low'.  TD asked if the PMB felt it reasonable to require a response 

from Alice-UK prior to setting-up of support - the agreement was yes, 

engagement is required.  TD and GP would iterate, draft an email and 

contact the individual involved - engagement was required along with 

estimates of requirements, otherwise no priority could be afforded Alice.  

AS advised that input was required before Wednesday at 10:30 am, which was 

the next CASTOR Team Meeting.  TD noted a deadline of Tuesday evening for 

a response from Alice.



2.  Tape Access

===============

TD reported that there was a major issue w.r.t. tape use at CERN raised at 

last Tuesday's MB - in current operation it was clear that tape access was 

~10MB/s (or less) rather than 50MB/s.  The agenda link is here: 

http://indico.cern.ch/conferenceDisplay.py?confId=22194 

-> Storage Efficiency



TD advised that slides had been provided regarding rates at CERN for all 

experiments.  The discussion at the MB related to tests of the tape system 

being incorporated into planning, but it was noted that there had been 

problems accessing tape.  RJ advised that CERN were not providing D1T0 but 

were backing up to tape.  There was a discussion regarding the processing 

and reading of tape.  AS advised that there were performance issues as 

well, relating to concurrent writing to disk and reading from disk, and 

multiple streams.  TD noted that CCRC was meant to address simultaneous 

contention, a week should be designated for ATLAS, CMS and LHCb re file 

access alongside user analysis.  GP advised that all CASTOR sites were 

banned at LHCb at present for other reasons, therefore no efficiency 

figures were available.  TD asked if a week was possible for large 

sequential access tests?  AS advised that no week was yet designated 

except for CCRC.  GP noted that migration to CASTOR has to happen for all 

experiments first.  DB asked if extra tape drives were required at the 

moment.  TD noted no, not yet - types of rate were required along with 

figures from tests, which would give realistic throughput to determine 

accurate disk/tape balance.  JG suggested we go with the plan for February 

'08 then determine access rates in May.  AS would contact Tim Folkes to 

order six tape drives as per the original plan.



3.  GridPP20 Agenda

===================

TD asked whether there were any user-based talks?  Did GP, RJ, or DC have 

any speakers relating to hands-on experience of experiments?  TD advised 

that the registration listing was currently being used to determine 

possible speakers but Chairs had not yet been finalised.  Were there any 

updates to the main Agenda?  This was ongoing.



4.  AOCB

========

None.



STANDING ITEMS

==============



SI-1  Dissemination Officer's Report

------------------------------------

SP reported that a rejection had been received from the Royal Society 

Summer Exhibition - SP would pursue feedback regarding this rejection.  

However, STFC had an LHC stand accepted and have said they will aim to 

include something about Grid on this.  SP expressed thanks to DB for 

passing on a couple of suggestions about news items. SP had contacted 

UKQCD about news items on their biomed mini-PIPSS award and a demo of 

integrating 5 regional Grids shown at a recent conference. SP was also 

currently working on something about GANGA, and Mike Kenyon would forward 

information on ELSSI.  SP reported that Neasan O'Neill would attend the 

EGEE All Activities meeting in Bulgaria next week at the request of EGEE 

NA2, to take part in a meeting discussing Grid communication strategies.  

The second phase of the bid for an STFC Science in Society large award, to 

fund someone for LHC@home, was currently being worked on. This was due at 

the end of this month.



SI-2  Tier-1 Manager's Report

-----------------------------

AS reported as follows:

1) Tenders:

a) Disk tender - delivery is scheduled for Thursday this week - if all 

   goes to schedule, acceptance will be complete by the end of February.

b) CPU tender - the order had been placed and scheduled for delivery 28 

   February.

c) Tape drive purchase - the purchase plan was being finalised.  If the 

   order is placed in the next couple of days we may be able to get the 

   equipment on the ground in time for February's CCRC08.



2) Memory upgrades are all completed. Closed.

   

3) Work on the power supply is proceeding - so far with no disruption to 

   service. Measurements indicate that we have (just) sufficient power to 

   operate with one transformer out of service. This will continue to be 

   the case until late February (when the next CPU delivery will push us 

   over the limit). As it is likely that transformer work will be 

   completed before the CPU delivery, it is likely that e-Science will not 

   have to reduce electrical load.



4) The RAL PPD disk space loan (approx 80TB) is available.



Service

-------

1) SAM availability for last week was 99%.

2) CASTOR:

a) Problems with the ATLAS CASTOR instance were traced to queries 

   overflowing the Oracle query cache. This was increased and ATLAS 

   production restarted on Wednesday.

b) LHCB have encountered problems (also at CNAF) where rfio requests leave 

   files open after the end of the IO job. This gradually leads to a 

   degradation in performance as all IO job slots become occupied. 

   Investigations are still underway.

3) SL4 Migration - The SL4 UI is configured and is being tested.

4) The LHCB ORACLE based LFC is operating well - Item closed.



Progress to Grid Only Access - This standing item documents the status of 

work towards achieving GRIDPP milestone 0.18 "Access to Tier-1 resources 

by Grid Interfaces Only"



1) qsub access was scheduled to terminate last Friday but we have a few 

   details to finalise and will finally switch off qsub by Wednesday.



SI-3  Production Manager's Report

---------------------------------

JC reported as follows:

1) There have been several requests for improvement/changes to the EGEE 

   broadcast system.

2) A new process has been introduced whereby a ticket is not closed but 

   goes in to the "verify" state.

3) A bug in the service availability algorithm in Gridview (so that the 

   calculation considers services with no critical tests as up and 

   available) will be corrected from today.

4) Manchester has ~9GB of space occupied by CMS and ALICE software. 

   Considering the policies of these experiments the site wants to know 

   how to deal with this software (extra space on the software servers 

   would be useful).

5) Over the Christmas period the old gridpp VOMS certificate expired. The 

   resultant site reaction indicated that the change over was not widely 

   known.

6) Ops test performance over the Christmas and New Year period has been 

   stable for most sites. Several sites were 100% available. The worst 

   performing sites over the period are similar to during November/early 

   December. Overall Q4 saw an average availability of 86% vs 85% for Q3.

7) The most significant problem over the last few weeks (as already 

   discussed) was for ATLAS due to CASTOR. This has lead to reduced use of 

   UK Tier-2s.



There was a discussion regarding enabling and supporting VOs and the space 

available to them that sites are responsible for.  It was agreed that 9GB 

was not felt to be excessive for a software area and that a bigger area 

was appropriate if required.  TD noted that VOs should be supported on a 

site basis and any plans to drop individual VO support should be after 

discussion with the Region and ultimately with the VO concerned.  It was 

reported that ECDF at Edinburgh was now a new site with a shared cluster.



Meetings:

A) There was a CCRC'08 planning meeting on 10th Jan:

http://indico.cern.ch/conferenceDisplay.py?confId=24844



B) There was a GDB last week:

http://indico.cern.ch/conferenceDisplay.py?confId=20225. The focus was

benchmarking; data management; worker node issues and security policies 



SI-4  LCG Management Board Report

---------------------------------

It was noted that experiment requirements were still awaited in response 

to MB questions.  RJ, GP, DC would be sent a url relating to CCRC08 with 

planning meeting details, so that the summary of experiment requirements 

can be checked to ensure no major mismatch [done during meeting].  TD 

reported that the tape issue had already been covered and that CCRC 

planning would be reviewed again next time.



SI-5  Documentation Officer's Report

------------------------------------

SB was not present.



REVIEW OF ACTIONS

=================

272.4 AS to check the current Tier-1 disaster recovery plan and circulate 

the existing version to the PMB.  It was reported that this document does 

not exist, but it was planned to have one in the longer term.  TD would 

incorporate in v0.4 anything that AS considered relevant.  AS will check 

and advise additions.  Ongoing.



277.2 DC to provide an update and re-evaluation of CMS/CASTOR 

deliverables.  TD advised that there was a CMS/CASTOR document on 

deliverables which should be revised in light of the December '07 tests.  

DC to take the token for this now and iterate with DN.  Ongoing.



277.5 Disaster Recovery 'Team B': SB, JC, TD, SP, DB to analyse the wider 

issues of disaster planning, mapped to the experiments' lists, and this 

work would include Project Management.  A Recovery Plan was required.  It 

was agreed that JC was in charge of this and the experiment input relating 

to subsets of the disaster plan.  SB/JC to progress.  It was noted that 

the AFC Service was also linked to this.  Ongoing.



277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal 

with the issue of user experience and design of an easily-found lookup 

facility for grid error messages. SL reported that he had started the 

ATLAS wiki page and would circulate the url.  Ongoing.



280.6 JG brought up the issue of the biomed VO and 'sieving' at the ROC 

Manager's meeting - a broadcast is to go out from EGEE which will be 

helpful in underlining acceptable use of Grid resources and would act as a 

reminder to VOs about the policy they have signed-up to in relation to 

their users.  JC had now emailed the Chair to have this discussed.  JG 

reported that a new VO was now set up but there were few resources 

allocated to it as yet, although the home Institute may be giving funds.  

Pending further info from JC.  EGEE broadcast action ongoing - JG will 

bring-up the broadcast action at the ROC VO meeting tomorrow (Tue 15).  

Ongoing.



280.7 JC to mention the issues (when approached by a VO with regard to 

joining) of the 'standard' 6-month introduction period, following which 

the VO must set-up something specific to them, if appropriate.  This was 

discussed at DTeam.  JC to email GridPP VO members if possible - ongoing.  

This was a standing action - JC had discussed it with the Tier-2 

Co-ordinators in relation to VO members.  JC to send email.  JC reported 

that he had received a request from OMII to set-up a GridPP VO - it was 

preferable for this to be done through NGS.  Ongoing.



280.8 JG to investigate the UKI ROC website - any change/progress, and 

report-back.  Ongoing.



282.2 SP to progress the Project Map using the T1 service areas and input 

from the meeting.  Ongoing.



282.6 JC and SB to progress existing 'disaster planning' template for next 

F2F meeting on 1st Feb. Involve experiments as necessary. This was a 

follow-up from the last F2F, and was to be distinguished from 277.5 action 

which is a longer-term one relating to the OC.



283.1 TD to arrange a phone connection at TC Dublin for RJ to join the 

GridPP20 PMB meeting remotely.  Ongoing.



283.3 RM/TD to prepare use cases appropriate for the UK community, 

[relating to item 278.10 EGEEIII -> EGI].  RM reported that he would be 

attending a workshop at the end of January at CERN (by EGI design study 

project) and would report-back at that time.  RM reported that use case 

and functions parts of the EGI website were now publicly visible.  RM 

would circulate the url for the use cases - a template was available to be 

completed.  All:  to provide inputs to RM in the template format provided 

via the url.  Done, action closed.



286.1 RJ to call a NorthGrid meeting to decide hardship funding 

allocations to Institutes.  RJ reported that a meeting had been held this 

morning.  Information would be sent to SL.  RJ summarised that the largest 

figure would go to Sheffield: 12k, with 6k each to Liverpool, Lancaster, 

and Manchester.



286.2 SL and DB to iterate regarding clause associated with the issuing of 

Tier-2 hardware grants.  SL had sent DB an email with suggestions.  

Ongoing.



286.3 AS to formally apologise to ATLAS on behalf of GridPP for the outage 

problems over the Christmas period.  AS reported that he had sent a formal 

email apology to Kors.  The identified cause had now been resolved and 

ATLAS production re-started ok.  Done, item closed.



286.4 GP to advise the UB that the special cases for non-Grid access to 

the UK Tier-1 were approved.  Done, item closed.



286.5 AS to organise a service message at login relating to non-Grid 

access being withdrawn.  Ongoing.



286.6 JC and SB to incorporate the AFS Service into the disaster planning 

document.  This was added to the list.  Done, item closed.



ACTIONS AS AT 14.01.08

======================

272.4 AS to check the current Tier-1 disaster recovery plan and circulate 

the existing version to the PMB.  It was reported that this document does 

not exist, but it was planned to have one in the longer term.  TD would 

incorporate in v0.4 anything that AS considered relevant.  AS will check 

and advise additions.



277.2 DN to provide an update and re-evaluation of CMS/CASTOR 

deliverables.  TD advised that there was a CMS/CASTOR document on 

deliverables which should be revised in light of the December '07 tests.  

DC to take the token for this now and iterate with DN.



277.5 Disaster Recovery 'Team B': SB, JC, TD, SP, DB to analyse the wider 

issues of disaster planning, mapped to the experiments' lists, and this 

work would include Project Management.  A Recovery Plan was required.  It 

was agreed that JC was in charge of this and the experiment input relating 

to subsets of the disaster plan.  SB/JC to progress.



277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal 

with the issue of user experience and design of an easily-found lookup 

facility for grid error messages. SL reported that he had started the 

ATLAS wiki page and would circulate the url.



280.6 JG brought up the issue of the biomed VO and 'sieving' at the ROC 

Manager's meeting - a broadcast is to go out from EGEE which will be 

helpful in underlining acceptable use of Grid resources and would act as a 

reminder to VOs about the policy they have signed-up to in relation to 

their users.  JC had now emailed the Chair to have this discussed.  JG 

reported that a new VO was now set up but there were few resources 

allocated to it as yet, although the home Institute may be giving funds.  

Pending further info from JC.  EGEE broadcast action ongoing - JG will 

bring-up the broadcast action at the ROC VO meeting tomorrow (Tue 15).



280.7 JC to mention the issues (when approached by a VO with regard to 

joining) of the 'standard' 6-month introduction period, following which 

the VO must set-up something specific to them, if appropriate.  This was 

discussed at DTeam.  JC to email GridPP VO members if possible - ongoing.  

This was a standing action - JC had discussed it with the Tier-2 

Co-ordinators in relation to VO members.  JC to send email.



280.8 JG to investigate the UKI ROC website - any change/progress, and 

report-back.



282.2 SP to progress the Project Map using the T1 service areas and input 

from the meeting.



282.6 JC and SB to progress existing 'disaster planning' template for next 

F2F meeting on 1st Feb. Involve experiments as necessary. This was a 

follow-up from the last F2F, and was to be distinguished from 277.5 action 

which is a longer-term one relating to the OC.



283.1 TD to arrange a phone connection at TC Dublin for RJ to join the 

GridPP20 meeting remotely.



286.1 RJ to call a NorthGrid meeting to decide hardship funding 

allocations to Institutes.  RJ reported that a meeting was scheduled for 

this morning.  Information would be sent to SL.  RJ summarised that the 

largest figure would go to Sheffield: 12k, with 6k each to Liverpool, 

Lancaster, and Manchester.



286.2 SL and DB to iterate regarding clause associated with the issuing of 

Tier-2 hardware grants.  Ongoing.



286.5 AS to organise a service message at login relating to non-Grid 

access being withdrawn.



287.1 TD and GP to iterate, draft an email, contact the Alice 

representative (technical) at CERN and request inputs regarding estimates 

of requirements for disk allocation - deadline for response from Alice was 

Tue evening (15 Jan).



287.2 AS to contact Tim Folkes to order six tape drives as per original 

plan.



287.3 All:  to provide inputs to RM in the template format provided via 

the circulated url - re EGEEIII -> EGI and use cases.



INACTIVE CATEGORY

=================

271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN 

link, in one year's time, when actual data on breakages is available.  

Due date would be September '08.



271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee 

the issue and collate info so that the PMB have something to revisit in 

one year's time.  Due date September '08.  It was noted that PC would 

circulate a revised document after discussion with ATLAS (RJ/PC/DN to 

iterate).



282.8 RM to monitor how R-GMA and networking issues impact on GridPP as 

matters progress.  RM advised that this item should be moved to the 

'inactive' category as it will develop over the coming months.  RM 

discussed the issue with Steve Fisher and advised that support of R-GMA is 

required whilst APEL is dependent on it.  RM reported that he has spoken 

to SF and there is currently no change to the R-GMA situation - process 

ongoing.



The meeting closed at 2:30 pm.  The next PMB would take place on Monday 21 

January at 1:00 pm.
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options