JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  February 2008

UKHEPGRID February 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 292nd GridPP PMB meeting

From:

Tony Doyle <[log in to unmask]>

Reply-To:

Tony Doyle <[log in to unmask]>

Date:

Wed, 20 Feb 2008 15:26:22 +0000

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (20 lines) , 080218.txt (1 lines)

Dear All,

     Please find attached the latest GridPP Project Management Board 
Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


GridPP PMB Minutes 292 - 18th February 2008 =========================================== Present: Tony Doyle, Sarah Pearce, Roger Jones, Stephen Burke, David Britton, Steve Lloyd, John Gordon, Jeremy Coles, Peter Clarke, Glenn Patrick, Andrew Sansum, Dave Colling, Tony Cass, (notes by DC) Apologies: David Kelsey, Robin Middleton, John Gordon, Neil Geddes 1. Disaster Planning - Tier-1 power failure issues =================================================== JC reported on the disaster planning review of the recent power failure at RAL - his document had been circulated prior to the meeting. It was noted that there was now an internal CERN number for such communications, which was Ext 75011. DB asked if a meeting would always needed in order to determine the order in which services should be brought back. He thought that it would make sense to have an ordered list/flow diagram. TC pointed out that CERN had a system that recorded the service interdependence and enabled them to recover from crisis events. TC and JC to iterate regarding this following the meeting. TD commented that this report was very specific and requested that more general lessons be learned. There was a discussion between TD and JC concerning footprints/GGUS use in these circumstances. JC asked what the order should be for bringing up CASTOR instances. TD suggested that the Tier 1 should make a plan of this order which would then be circulated to the experiments. 2. AOCB ======== It was reported that Greig Cowan would join a group of dCache experts. There was some discussion about the appropriate level of support and long term support for dCache. STANDING ITEMS ============== SI-1 Dissemination Officer's Report ------------------------------------ SP reported that Neasan O'Neill had taken a stand to EGEE user forum. This had received a fair amount of attention largely because it wasnt just GridPP, so it has been decided to do the same for Istanbul. TD noted that Morag Burgon-Lyon had found the meeting useful. JC noted that it had been an interesting meeting but that there were not many actual users present. SP reported that there will be news releases on the User Forum, the Atlas meeting and there will be an article about Jens Jensen's work on SRB and SRM interoperability. SP asked if it was worth having a press release on CCRC, and if not there could be some news items for the GridPP website. TD suggested that rather than three news articles there should be a single one with input from the experiments. JC pointed out that the GDB in March will have a large Post Mortem on CCRC. It was noted that there will be an Industry meeting on the 21st of May. TD will be a speaker at this meeting and DC will also attend. It was noted that iSGW was looking for punchy, innovative ideas in an attempt to get some readers. SI-2 Tier-1 Manager's Report ----------------------------- In absentia AS provided the following report: 1) Tenders a) Disk tender - supplier load test completed. Our 28 day load test has not started and is now running late. Load test not yet started but is planned to start today. b) CPU tender - Order placed and scheduled for delivery 28 February. Suppliers may deliver 1-2 weeks early. It will probably not be possible to complete the full 28-day acceptance test before it is necessary to pay the bill in this financial year. Once we have 1-2 weeks load test results the PMB will be asked to approve payment. c) Tape drive purchase - Six tape drives have been received (5 drives are currently being borrowed). Tape server requisition now signed and order expected to be placed early this week. d) Non-Capacity hardware order has been placed. Delivery is expected to be 1-2 weeks later than the CPU delivery. e) Oracle server hardware upgrade order has been placed. f) An order for a 32 port non-blocking 10Gb switch has been placed. Delivery is expected in mid-March. g) An order for about 40K of tape media has yet to be placed but is planned to be placed this week. 2) Work on the power supply is complete. 3) We expect to commence work on replacing disk server backplanes w/b 25th February. CCRC equipment will not be dealt with until after the CCRC has finished at the end of February. Service ------- 1) SAM availability for last week was 96% although some experiments were impacted by only partial functioning of CASTOR early in the week (fallout from the power failure) wich were not detected by SAM. 2) CASTOR appears to be working well for ATLAS, CMS and LHCB. 3) SL4 Migration - The SL4 UI build has minor changes to be made and it will then be ready for release. Progress to Grid Only Access ---------------------------- This standing item documents the status of work towards achieving GRIDPP milestone 0.18 "Access to Tier-1 resources by Grid Interfaces Only". 1) We have a list of users allowed to submit via qsub. When non-Grid submission is reinstated only this list will be used. There was a discussion on the replacement backplanes. The new timetable was fine with the experiments. SI-3 Production Manager's Report --------------------------------- JC presented the following report: 1) Experiments/CCRC: LHCb have little production happening at the moment but transfer tests for CCRC have started. They still have SAM test problems to resolve and others connected with the RB/WMS. CMS ramping up for CCRC but suffered due to loss of disk servers last week. Still have CASTOR issues. IC and Brunel fully ready for CMS CCRC activities. Bristol now appearing in CMS lists. ATLAS FDR data is reaching T2s well now (but there is not much of it). Problems come and go at the sites and are being resolved when they occur. Required space tokens are in place at most T2s now. Site readiness for FDR work is being tracked here: http://www.gridpp.ac.uk/wiki/AtlasFdr1 Biggest T2 problems have surrounded dCache SRmv2.2. It is thought that most problems faced have now been understood/resolved. 2) CPU utilisation has increased over the last week and has remained in the range 50%-67%. The SAM test average for UK sites is up from 84% to 86% for the last week. The WLCG Tier-2 reliability report for January 2008 has now been circulated. The reliability:availability figures given are: London (67%: 73%); NorthGrid (89%: 89%); ScotGrid (95%:95%) and SouthGrid (90%: 87%). 3) The UKI helpdesk importer for GGUS to Footprints has experienced difficulty following a move to "validated" as the last ticket status. A manual checking process is currently in place. 4) camont observe a factor of 6 improvement in submission times when moving from the LCG RB to the gLite WMS. There was a discussion about the transition to the gLite WMS. This will be discussed at the dteam meeting and the location of the current SL3 versions will be rebroadcast. 5) The newly created Tier-1 blog (http://www.gridpp.rl.ac.uk/blog/) has now been added to the GridPP aggregator: http://planet.gridpp.ac.uk/. 6) Many sites have complained about inefficient biomed jobs and a lack of VO/user response in understanding them. This is now being taken up directly with the VO management. Meetings: A) WLCG workshop 21st-25th April (http://indico.cern.ch/conferenceDisplay.py?confId=6552). I have requested sites to inform me of their intention to send someone with GridPP funding. So far I have had 7 (T2 site) replies. B) There is an EGEE ROC manager's meeting tomorrow: http://indico.cern.ch/conferenceDisplay.py?confId=23754. C) ATLAS software & computing workshop takes place next Monday-Friday at CERN: http://indico.cern.ch/conferenceDisplay.py?confId=22132. On the Wednesday there is an ATLAS T0/1/2/3 Jamboree. SI-4 LCG Management Board Report --------------------------------- JG gave a quick summary of his talk to the MB, noting that the T1s are not as ready as had been hoped at this stage. SI-5 Documentation Officer's Report ------------------------------------ SB reported that he had done some work on both the web pages and the user guide. This would be given to EGEE documentation group in due course. REVIEW OF ACTIONS ================= 277.2 DN to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. DC reported that the document would be sent out this week. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. SB was leading this with inputs from SP, SL and JC where needed. A new simple summary was required of all areas available plus a lookup/links facility, for the OC to review. This would include a list of most recent types of problems (possibly a 'top 12' for users - what the error means and the course of action to follow). SB to progress this. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 289.2 DC to check current situation regarding gLite WMS and SL4 - current status to be conveyed to DTeam. 290.1 JC to write-down membership of DTeam. 290.2 RJ, DC and GP to nominate experiment user representatives for the Deployment Board. ATLAS user person for the DB will be James Catmore and also Raja Nandakumar and Stuart Wakefield. Done, item closed. 290.3 SL and DB to review the Tier-1 Board Terms of Reference and see what could be formally incorporated into the new Deployment Board Terms of Reference. DB to forward to JG to see if we really need a Tier-1 Board. JG pointed out that purchasing was taking a long time and so we need to start earlier in future - this will require knowledge of the scale of the purchase. Done, item closed. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. 290.5 All: to check their individual roles as outlined and advise DB of any required changes. DB advised that he required input by next Monday 18th. Done, item closed. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). 290.9 Quarterly Report for Tier-2 staff to be compiled by the Production Manager. 290.10 TD as Technical Director to provide a report showing effort figures; milestones & metrics; and a table of posts showing Technical Support. 290.11 DB to progress the situation at Manchester. 290.12 GP/SB/DC to define the portal and documentation Support posts and ensure they form a comprehensive basis for user support (both documentation and Grid access assistance), overseen by the UB Chair. 290.13 DB to complete the document re Reporting and Reporting Routes relating to staff, and circulate it, thereafter it would be posted on the website as a record. 290.14 RM to circulate the EGI Workshop Agenda. 290.15 JG to check with Malcolm Atkinson re attending the next EGI workshop in Rome (March). JG will attend the EGI meeting in Rome. Done, item closed. 290.16 NG noted that he had provided a draft paper relating to the end of EGEE III but would add information that addressed the period beyond 2011 and re-circulate. NG will bring this to the PMB next week. Done, item closed. 290.17 Re the Project Map, SP would look at the EGI wiki, and NG would consider more inputs relating to box 6.2. 290.18 Regarding the LCG box on the Project Map, SP to iterate with TC and bring this issue back to the PMB. 290.20 RM to provide more detailed figures on travel expenditure - broad-brush percentages would assist with decisions re travel in GridPP3. 290.21 SS to hand-out travel forms at Dublin ('overseas' claim on web to be submitted as 'actuals' and should be submitted before the end of March 2008). 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. 291.01 AS and JC to iterate on Thursday afternoon with a view to reporting back on the recent Tier-1 outage to the PMB next Monday. Done, item closed. 291.02 JG to raise the issue of UK CA certificates being taken out of CERN VOMS, as an item at the MB. JC confirmed he would put it on the Ops meeting Agenda. Done, item closed. ACTIONS AS AT 18.02.08 ====================== 277.2 DN to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. DC reported that the document would be sent out this week. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. SB was leading this with inputs from SP, SL and JC where needed. A new simple summary was required of all areas available plus a lookup/links facility, for the OC to review. This would include a list of most recent types of problems (possibly a 'top 12' for users - what the error means and the course of action to follow). SB to progress this. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 289.2 DC to check current situation regarding gLite WMS and SL4 - current status to be conveyed to DTeam. 290.1 JC to write-down membership of DTeam. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). 290.9 Quarterly Report for Tier-2 staff to be compiled by the Production Manager. 290.10 TD as Technical Director to provide a report showing effort figures; milestones & metrics; and a table of posts showing Technical Support. 290.11 DB to progress the situation at Manchester. 290.12 GP/SB/DC to define the portal and documentation Support posts and ensure they form a comprehensive basis for user support (both documentation and Grid access assistance), overseen by the UB Chair. 290.13 DB to complete the document re Reporting and Reporting Routes relating to staff, and circulate it, thereafter it would be posted on the website as a record. 290.14 RM to circulate the EGI Workshop Agenda. 290.17 Re the Project Map, SP would look at the EGI wiki, and NG would consider more inputs relating to box 6.2. 290.18 Regarding the LCG box on the Project Map, SP to iterate with TC and bring this issue back to the PMB. 290.20 RM to provide more detailed figures on travel expenditure - broad-brush percentages would assist with decisions re travel in GridPP3. 290.21 SS to hand-out travel forms at Dublin ('overseas' claim on web to be submitted as 'actuals' and should be submitted before the end of March 2008). 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. 292.1 TC and JC to iterate regarding the CERN system that recorded service interdependence and enabled them to recover from crisis events. 292.2 JG to review the interplay between Footprints and GGUS tickets on the helpdesk. 292.3 AS to produce an order for the CASTOR instances to be brought back. 292.4 JC to use the template from the disaster planning and apply it to the RAL power failure. INACTIVE CATEGORY ================= 271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN link, in one year's time, when actual data on breakages is available. Due date would be September '08. 271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the issue and collate info so that the PMB have something to revisit in one year's time. Due date September '08. It was noted that PC would circulate a revised document after discussion with ATLAS (RJ/PC/DN to iterate). 282.8 RM to monitor how R-GMA and networking issues impact on GridPP as matters progress. RM advised that this item should be moved to the 'inactive' category as it will develop over the coming months. RM discussed the issue with Steve Fisher and advised that support of R-GMA is required whilst APEL is dependent on it. RM reported that he has spoken to SF and there is currently no change to the R-GMA situation - process ongoing. 290.19 DB/SP to progress the details of the Project Map over the next few months, cross-checking that all elements are incorporated, including strategic priorities and staffing. To be completed before the next Oversight Committee.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


WWW.JISCMAIL.AC.UK

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager