JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  March 2008

UKHEPGRID March 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 296th GridPP PMB meeting

From:

Tony Doyle <[log in to unmask]>

Reply-To:

Tony Doyle <[log in to unmask]>

Date:

Thu, 27 Mar 2008 10:59:54 +0000

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (20 lines) , 080320.txt (1 lines)

Dear All,

     Please find attached the latest GridPP Project Management Board 
Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


GridPP PMB Minutes 296 - 20th March 2008 ======================================== Present: Tony Doyle, Sarah Pearce, David Britton, David Kelsey, Steve Lloyd, Robin Middleton, Jeremy Coles, Glenn Patrick, Andrew Sansum, Dave Colling, Suzanne Scott (Minutes) Apologies: Roger Jones, Stephen Burke, Tony Cass, John Gordon, Pete Clarke, Neil Geddes 1. Tier-1 short-term spending ============================== AS made a request to the PMB as follows: Some time ago AS alerted the PMB to the fact that we may need to agree payment of the disk and CPU deliveries before formal acceptance had completed. That is to allow us to pay the bill in this financial year which is considered desirable. Our current position is that: 1) 120 (of 182) disk servers have reached 19 out of 28 days planned RAL load test. The remainder trail some way behind for a variety of minor reasons (such as problems with a network switch or to hand over initially, a few with hardware failures such as the RAID card). At present our drive ejection/failure rate is consistent with about 4% per annum. A little high, but not unexpected during the burn in period (steady state rates are usually nearer 2-3%). Although we have not completed the testing we have done a more careful and systematic job than previously (which were also heavilly tested) and are happy that we see no systematic problems. Hardware should start churning out of the end of the pipe by next Wednesday. 2) The CPUs are lagging behind somewhat, but the hardware has been installed to Martin's satisfaction and the suppliers have run a 7 day load test with the SL4 O/S. We are happy the system runs and there are no thermal concerns. We hope to start our load test before Easter for one lot, but this is unlikely to provide further useful input for several weeks. At this stage we would like to pay the bills and resolve any further (hopefully minor) problems as they crop up operationally. Will the PMB approve payment? TD commented that this request appeared to be relatively straightforward. AS agreed, except for the financial implications of not spending within this financial year. AS reported that re the hardware, the stress-testing of disk servers would be going ahead and should be in production within 7 days. The bill-paying was required by the middle of next week. Regarding the CPU delivery, it was delivered and the suppliers were doing their own load testing and were satisfied. The hardware was running ok and our own stress-testing was about to commence. DK asked if the bill would therefore be paid around 2nd April? AS confirmed yes, but this would still count as within this financial year. AS confirmed he would chase-up the CPU testing. Regarding tape, AS advised that it was difficult to purchase, have it installed, and pay invoices on time at this stage; future drive upgrades were as yet unknown. DB noted that we did not want to buy equipment that we were not absolutely sure that we needed. DB proposed supporting the payment which AS requested. AS advised that it was a large sum and we were not quite through the whole process, but it was important to involve the PMB at this time. TD requested that if possible, the CPU tests should be carried out over Easter. The early payment was agreed and approved. 2. GridPP input to STFC consultation process ============================================= DB had circulated a draft email response. TD noted that the approach was to tread carefully between the CB and STFC and provide input into the consultation process - it was hoped the final version would be signed-off and submitted to STFC later today. DB went through each point as stated. It was noted that we have obligations to the wider community; there should not be further cutbacks to computing support; the 3-month delay to the hardware support is a problem; we have suffered four major cuts of 13 million in total - although the Review believes that 5% is a small amount, seen in context it is extremely damaging. GP asked for specific support for LHCb. DB recommended that this not be made explicit in the overall GridPP response. TD noted that LHCb will be overtly supported elsewhere. SL agreed, noting that lots of others will make statements about LHCb, and that it was important not to dilute the GridPP message. GP advised that GridPP was in danger of losing two experiments off the Project Map: LHCb and ALICE. DK noted that a balance was important, but that the GridPP message should not be diluted. DB suggested that a statement could be inserted on the effect the cuts to LHCb and ALICE would have on the Tier-1 and its ability to provide a viable service - point two could be enlarged slightly without mentioning specific experiments. DK noted that there were thresholds below which things were not viable. It was agreed to add in concerned wording that cuts to GridPP support for individual experiments would take the Tier-1 below a viable level: "reduce Tier-1 level below a critical threshold" or similar. This was agreed. DK commented on point three: that any future delay would become even more critical. Some comments by SP were still to be incorporated. It was agreed to submit the statement as amended and circulate via the Minutes. It was also circulated to the CB. The final statement, following feedback from CB and PMB members, was as follows: GridPP feedback on the programmatic review. 1. GridPP acknowledges that it is a user-led project that provides a service to the community and if the scope of the constituent community is altered then GridPP should respond appropriately. GridPP is similarly aware that it must meet international obligations and has already purchased the hardware necessary to meet the 2008 requirements which restricts the options for incorporating new reductions. 2. The strong scientific merit of all the experiments serviced by GridPP has previously been established by rigorous scientific peer-review. We believe their reclassification by the recent Programmatic Review is a reaction to a funding crisis and not a better representation of their scientific value. We are concerned that cutting back GridPP support for specific experiments will reduce their Tier-1 capability below a critical threshold and translate directly into a disproportionate reduction in the UK physics output. 3. Regardless of the proposed cut, the Programmatic Review has delayed 2m of Tier-2 grants for analysis hardware by a minimum of 3 months. This delay is already causing problems for the UK LHC groups in the preparations for first data. 4. GridPP notes that the proposed reduction is the fourth in a sequence of cuts and takes the project further below the level judged to be the "minimum viable" by the PPRP review committee. There is an increased risk that GridPP will fail to deliver a competitive service for UK physicists. The sequence of cuts was as follows: A. The GridPP3 proposal was de-scoped to a 70% scenario in the STFC award of March 2007. The PPRP agreed that this was the "minimum viable level". B. The GridPP3 project was further reduced in July 2007 by the removal of 1.3m that had been preserved in the GridPP2 project through careful management in response to delays in the LHC schedule and due to the success in attracting European funding for some GridPP posts. C. The lack of funding for application support posts in the Rolling Grant round was recognised by GridPP as a serious risk to the UK success in extracting LHC Physics. We proposed to use the majority of the 1.3m saved within GridPP2 to support this activity. When that was removed, the posts were funded out of the GridPP Working Allowance with the support of four Oversight Committees (GridPP, ATLAS, CMS, and LHCb). However, this further restricted the future options for managing the core GridPP3 project. 5. GridPP is now concerned at the prospect of a further 5% cut just at the point of delivery. Cumulative cuts of 13m in the last year threaten our ability to meet international obligations and UK physics analysis goals. We are concerned that these previous cuts were not fully appreciated by the review committee. GridPP Collaboration. 3. AOCB ======== Re the DANTE proposal, TD proposed that the PMB endorse Robin Tasker's email to David Foster, as follows: "The UK is not supportive of the proposal from DANTE on many grounds. The existing OPN is already operational and we judge we are close to consensus on agreeing the operational handbook; there would be considerable additional cost to follow the DANTE proposal; and concern was expressed that the "ownership" of the OPN would shift which could restrict our ability to manage its operationa and development. However we are also concerned that DANTE do not see this outcome in too negative a light as their contribution engaging with the LHC community is to be valued and encouraged." The PMB agreed that DANTE was not appropriate. DB noted that it would be useful to know what it was that DANTE had wanted to achieve. TD advised that he would sent DB some further info. It was agreed to endorse Robin Tasker's response. There had been an AHM call for papers which DB had circulated. This would be discussed at the PMB next Thursday and should be added to the Agenda. STANDING ITEMS ============== SI-1 Dissemination Officer's Report ------------------------------------ SP reported that Neasan O'Neill had done a news item on GridPP20. SP was currently awaiting a news item on the CCRC from GS and Raja Nandakumar. She had received a response from STFC relating to LHC@Home - the grant application had been turned down, with the feedback that the proposal did not encourage enough engagement with LHC and hadn't before tried schools as part of it. SP reported that she was continuing work with the experiments to get more applications to run on [log in to unmask] SP asked whether a press release would be appropriate as yet for GridPP3? TD suggested sending a draft to STFC for review. SP noted that finances did not require to be mentioned. It was agreed that SP approach STFC for feedback on a press release. TD suggested that a news item on the Project Map would be useful. SP suggested that she could do this for the website, but a press release itself would take a different form. DK noted that as we are entering the data-taking phase, it is important to report something on GridPP3. SI-2 Tier-1 Manager's Report ----------------------------- AS provided the following report: 1) Purchases: a) Disk tender - supplier load test completed. Our 28 day load test has now completed about 21 days for the majority of servers and is progressing well. b) CPU tender - Delivery received, installed and tested by suppliers. Our 28 day load test is about to commence. c) Tape servers received and installed - closed. d) Non-Capacity hardware delivered and accepted - will move into production as required - closed. e) Oracle server hardware upgrade order has been placed - eta next 7 days. f) A Force10 C300 switch with 32 non-blocking 10Gb ports has been received and will probably move into production in April (planning still underway). This will be the main Tier-1 top level switch replacing our Nortel 5530 central stack. g) All tape media has been received - closed. h) Additional RAID cards for the 2007 disk servers have been ordered and are expected in the next week. i) Replacement AFS servers have been received - they will go into production as part of the AFS migration (covered under a seperate item in a later report). j) Some Xen capable hosts have been ordered for the PPS cluster. 2) Backplane work has nearly completed - there are twelve servers outstanding on the ATLAS CASTOR instance. 3) There was a scheduled 40min network outage this Tuesday as the main site router was upgraded. We don't route our data services through this router so will see little direct benefit. Service ------- 1) SAM availability for last week was 98% (SL extract). RAL-LCG2 reliability for February (MB report) was 93% (target 93%). 2) CASTOR: a) Upgrades to 2.1.6 are underway but have been delayed after encountering a bug. Work is rescheduled for next week after hot fixes have been received. b) Upgrades to the ORACLE RACs have been delayed after sudden loss of staff from the database team. This work is now scheduled to restart in April. c) We have encounteresd problems on the 2.1.4 CMS instance with disk to disk copies running wild - however we don't intend to pursue this until after the upgrade to 2.1.6. d) Migration rates to tape have been improved to 20-30MB/s following system tuning. 3) The Tier-1 is now primarily a Grid-only service. Only approved exceptions continue to have access. 4) SL4 Migration The SL4 UI continues to be held up owing to team priorities being focused on hardware procurement, installation and acceptance. SI-3 Production Manager's Report --------------------------------- JC provided the following report: Many deployment matters were raised at GridPP20, in the PMB and DB. Here are a few updates/new items. 1) There has been some discussion about the UKQCD Tier-2 requirements document, but there is more discussion needed. The main requirement is "at least" 2GB memory per core with 4 GB preferred. Use of MPI is desirable. The combination of these requirements will make it difficult for most T2 sites to be of use in the short-term. 2) There seems to be renewed interest around EGEE/WLCG concerning the SAM tests and availability calculations. In particular how site metrics are impacted by "core" problems that are not a site fault. 3) GGUS have circulated a document outlining an Operational Level Agreement between them and TPMs (http://edms.cern.ch/document/888089). The requirements on TPMs are a concern for UK teams whose work is already divided. We will put together a response with our concerns. 4) For EGEE-III an automation team is being created with a mandate to, among other things, improve the integration of grid and site monitoring: http://edms.cern.ch/document/888089. A Nagios instance is already available (sysadmins look here: https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringNcg). 5) ATLAS will soon move production to use TierofAtlas settings for ATLASMCDISK for production output. Few GridPP sites currently advertise this token: http://wn3.epcc.ed.ac.uk/srm/xml/srm_token_table. Sites which are to be used are being asked to update their configuration. Meetings: a) CCRC'08 F2F meeting at CERN - 1st April. http://indico.cern.ch/conferenceDisplay.py?confId=30246. b) The next UB meeting has been moved to 14:00 Wednesday 16th April. It was scheduled for 19th March. SI-4 LCG Management Board Report --------------------------------- TD noted a report on CCRC Feb Phase I and May Phase 2 tests being done. TC brought-up the issue of all projected IT and CPU capacity and power plans, being grown at 30% annual growth which means a limit on what can be done at the Tier-0 in the longer term. There were no other issues. SI-5 Documentation Officer's Report ------------------------------------ SB was not present. REVIEW OF ACTIONS ================= 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. SB was leading this with inputs from SP, SL and JC where needed. A new simple summary was required of all areas available plus a lookup/links facility, for the OC to review. This would include a list of most recent types of problems (possibly a 'top 12' for users - what the error means and the course of action to follow). SB to progress this. It was noted that James Catmore (via the DB) had volunteered to do this. This action is therefore transferred to SL for progression via the Deployment Board. Done, item closed. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. Ongoing - Regional VOs are not yet validated - pending at the moment. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. Ongoing. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. Ongoing - AS to provide the final GridPP2 and 2+ Quarterly Reports by end March. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). Ongoing. 290.9 Quarterly Report for Tier-2 staff to be compiled by the Production Manager. Done, item closed. 290.10 TD as Technical Director to provide a report showing effort figures; milestones & metrics; and a table of posts showing Technical Support. SP was currently progressing this - done, item closed. 290.18 Regarding the LCG box on the Project Map, SP to iterate with TC and bring this issue back to the PMB. Content had now been sent by TC, done - item closed. 290.20 RM to provide more detailed figures on travel expenditure - broad-brush percentages would assist with decisions re travel in GridPP3. This was now replaced by an action from the PMB F2F (see 295.10 below) - done, item closed. 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. Some progress had been made - item ongoing. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. Some progress made - item ongoing. 292.1 TC and JC to iterate regarding the CERN system that recorded service interdependence and enabled them to recover from crisis events. Reply awaited, to be followed up - ongoing. 292.2 JG to review the interplay between Footprints and GGUS tickets on the helpdesk. It was agreed that GGUS will be used as a helpdesk in the UK as determined by the DB. Action closed. 292.4 JC to use the template from the disaster planning and apply it to the RAL power failure. This has been done, and JC will circulate. Done, item closed. 293.2 A PMB document to be written for the OC regarding NGI metrics, and SP would provide some metrics for this. This has been replaced by an action from the PMB F2F (see 295.8 below). Done, item closed. 294.1 Steve Fisher to speak to Pat Kite in the first instance re core funding for training, and revert to the PMB if he required assistance with a formal proposal document. Done, item closed. 294.2 All - to provide DB with Agenda items for the F2F in Dublin. Done, item closed. 294.3 DB to contact Janet Seed or Jordan of STFC regarding up-to-date financial information. Done, item closed. 295.1 DB to re-draft the attachment to the GridPP letter to STFC (in response to the latest cuts imposed) and recirculate to PMB for approval. Done, item closed. 295.2 Re the Project Map, SP to insert 'network plans' to ensure they were up-to-date at each site - this would ensure 'suitable network planning provision'. [SP to see the wiki sent by TD]. Ongoing. 295.3 It was agreed that there should be a formal look at Network Planning for the Project Map next year involving PC, RJ, DK and RM - PC to organise. Ongoing. 295.4 TD (as Technical Director) to address the issue of Data & Storage on the Project Map and get back to SP with inputs. Ongoing. 295.5 RM to get back to SP with inputs regarding the EGEE box on the Project Map. RM gave clarification on R-GMA, and was still working on the EGEE box. Ongoing. 295.6 SP noted that she was awaiting a VOMS report from AS and a Grid Vulnerability report from DK - these were almost in the nature of two Quarterly Reports. AS and DK to provide appropriate inputs. These related to metrics and milestones from the Project Map. Ongoing. 295.7 Re network contingency, PC to request clarification from Robin Tasker if the cost quoted was for 1Gig only. Ongoing. 295.8 Re NGI planning, JG to produce a document/statement on the GridPP position (due to his MB perspective), and SP to assist with metrics. JG to liaise with RM re EGEE inputs. Ongoing. 295.9 DB, RM and SP to target categories for the travel budget for the coming year. Targets are required for how much GridPP might spend and in what categories of expenditure. Ongoing. 295.10 RM to provide categories and breakdown of travel + additionals to enable monitoring and decision-making. Ongoing. ACTIONS AS AT 20.03.08 ====================== 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. AS to provide the final GridPP2 and 2+ Quarterly Reports by end March. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. 292.1 TC and JC to iterate regarding the CERN system that recorded service interdependence and enabled them to recover from crisis events. Reply awaited, to be followed up. 295.2 Re the Project Map, SP to insert 'network plans' to ensure they were up-to-date at each site - this would ensure 'suitable network planning provision'. [SP to see the wiki sent by TD]. 295.3 It was agreed that there should be a formal look at Network Planning for the Project Map next year involving PC, RJ, DK and RM - PC to organise. 295.4 TD (as Technical Director) to address the issue of Data & Storage on the Project Map and get back to SP with inputs. 295.5 RM to get back to SP with inputs regarding the EGEE box on the Project Map. 295.6 SP noted that she was awaiting a VOMS report from AS and a Grid Vulnerability report from DK - these were almost in the nature of two Quarterly Reports. AS and DK to provide appropriate inputs. These related to metrics and milestones from the Project Map. 295.7 Re network contingency, PC to request clarification from Robin Tasker if the cost quoted was for 1Gig only. 295.8 Re NGI planning, JG to produce a document/statement on the GridPP position (due to his MB perspective), and SP to assist with metrics. JG to liaise with RM re EGEE inputs. 295.9 DB, RM and SP to target categories for the travel budget for the coming year. Targets are required for how much GridPP might spend and in what categories of expenditure. 295.10 RM to provide categories and breakdown of travel + additionals to enable monitoring and decision-making. 296.1 SP to approach STFC for feedback on a proposed press release relating to GridPP3. INACTIVE CATEGORY ================= 271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN link, in one year's time, when actual data on breakages is available. Due date would be September '08. 271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the issue and collate info so that the PMB have something to revisit in one year's time. Due date September '08. It was noted that PC would circulate a revised document after discussion with ATLAS (RJ/PC/DN to iterate). 282.8 RM to monitor how R-GMA and networking issues impact on GridPP as matters progress. RM advised that this item should be moved to the 'inactive' category as it will develop over the coming months. RM discussed the issue with Steve Fisher and advised that support of R-GMA is required whilst APEL is dependent on it. RM reported that he has spoken to SF and there is currently no change to the R-GMA situation - process ongoing. 290.19 DB/SP to progress the details of the Project Map over the next few months, cross-checking that all elements are incorporated, including strategic priorities and staffing. To be completed before the next Oversight Committee. The next PMB would take place on Thursday 27 March 2008 at 1:00 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager