JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  February 2008

UKHEPGRID February 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 293rd GridPP PMB meeting

From:

Tony Doyle <[log in to unmask]>

Reply-To:

Tony Doyle <[log in to unmask]>

Date:

Fri, 29 Feb 2008 13:32:43 +0000

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (20 lines) , 080225.txt (1 lines)

Dear All,

     Please find attached the latest GridPP Project Management Board 
Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


GridPP PMB Minutes 293 - 25th February 2008 =========================================== Present: Tony Doyle, Sarah Pearce, Roger Jones, Robin Middleton, John Gordon, Jeremy Coles, Peter Clarke, Andrew Sansum, Dave Colling, Tony Cass, Neil Geddes, (Suzanne Scott, Minutes) Apologies: Stephen Burke, David Britton, Steve Lloyd, David Kelsey, Glenn Patrick 1. 40 years of CPC - HPC thematic issue ========================================= It was noted that NG, PC and TD had been invited to contribute and the documentation had been forwarded to DB. TD suggested advertising via UKHEPGRID, asking people to contact DB if they wanted to contribute to this journal special issue to publish a software program developed during GridPP - timescale was March '09. It was agreed that PC would raise the issue with UKQCD. TD will contact Andrew Mcnab re GridSite. NG will contact NGS side to see if any interest. 2. NGI Metrics =============== DB and SL were not available today. SP noted that it is planned to put something in the Project Map and also bring-up the issue at GridPP20. It was noted that the OC was in May and was the deadline for the Project Map. RM noted that the EGI blueprint would help but strategic issues needed to be addressed in parallel. TD noted that input was required from STFC. RM advised that we could go to the OC on the basis of the EGI blueprint. TD noted that there were no problems with the metrics but that the milestones were difficult - these would need to be signed-off by STFC. The OC would be a useful forum for raising the issue - a document for this would need to be presented, with inputs from SP, NG, DB etc - it should be a PMB document. It was noted that LCG would be a part of this issue and the EGEE/EGI/NGI infrastructure would also be involved, but a best funding model was yet to be devised. It was agreed that a document would be written in this space for the OC, and SP would provide some metrics. It was agreed that TD would contact Trish Mullins and appraise her that an Agenda item, with further discussion, was planned for the F2F. STANDING ITEMS ============== SI-1 Dissemination Officer's Report ------------------------------------ SP reported a news item by Neasan O'Neill on the User Forum and the EGEE-All activities meeting - these were posted last week. There would be a news item on the ATLAS workshop. Re the new version of the website, SP advised that Andrew McNab and NO were working on it at present and it should be ready by the end of this week for the dissemination team to look at. There were preparations ongoing for the IoP HEPGRID meeting and posters had been requested. NO would check that all poster requests had been received. SP and NO were preparing for the presentation of the LHC@Home large award at Swindon. Regarding the industry workshop, SP asked whether TD and DC had replied back to Alex Efimov that they could speak? TD still to do [done following meeting]; DC had already confirmed yes. SI-2 Tier-1 Manager's Report ----------------------------- AS provided the following report: 1) Tenders: a) Disk tender - supplier load test completed. Our 28 day load test has not started and is now running late. The load test has taken longer to start than expected following disruption from the power cut and the need to restart supplier load test. We expect it will start later today. b) CPU tender - Order placed and scheduled for delivery by 28 February. We expect one supplier to deliver this Thursday but have no confirmed date from the second supplier yet. c) Tape drive purchase - Tape drives are in production. Tape servers are ordered. d) Non-Capacity hardware order has been placed. Delivery is expected to be 1-2 weeks later than the CPU delivery. e) Oracle server hardware upgrade order has been placed. f) An order for a 32 port non-blocking 10Gb switch has been placed. Delivery is expected in mid March. g) An order for about 40K of tape media has been placed. 2) Backplane work on non-CCRC disk servers will commence this week. Service: 1) SAM availability for last week was 100% (SL's tests). 2) CASTOR a) CASTOR appears to be working well for ATLAS, CMS and LHCB CCRC. b) Work on Alice is underway but deployment of the xrootd side of the service has been problematic. 3) SL4 Migration - The SL4 UI build has minor changes to be made and it will then be ready for release. Progress to Grid Only Access: This standing item documents the status of work towards achieving GRIDPP milestone 0.18 "Access to Tier-1 resources by Grid Interfaces Only" 1) Non-Grid job submission has ended. DC reported from the CMS experiment point of view things were going ok and the milestones had been passed. RJ was not sure re ATLAS, things were not going as smoothly at present. AS noted an 'acceptable' rate of current failure on servers - not too exceptional - and he noted that the crash rate was likely to be high. SI-3 Production Manager's Report --------------------------------- JC provided the following report: 1) A UKI monthly meeting was held last week. Among the items discussed were the move to have APEL publishing as a critical test (fails after 31 days without records being published) and storage token use being driven by CCRC activities. The experiments are asking for SL4 WNs at sites but many sites have yet to upgrade. 2) From the last WLCG GDB, "The LHC experiments requested sites which have WN capable of running in 64 bit mode to run them that way and to advertise the fact in the BDII." The request was also to install the 32-bit compatibility libraries so that certain jobs can still run. Aside: A 64-bit WN release has recently entered the PPS. 3) CCRC: On February 23rd between 22:00 and 23:00 GMT the average transfer rate from CERN to "anywhere" was 2.2 GB/s which was the highest so far (http://tinyurl.com/2vqv76). The main (e-logged) experiment issue reported against RAL T1 (21st) concerned proxies for CMS - now fixed. This led to the observation that one needs to be careful when using one certificate to manage multiple transfers (see: http://tinyurl.com/2nx2gs). 4) Greig Cowan has looked at ways of debugging dCache mapping issues. Have a look at some of the graphs to understand how complicated dCache pool management has become: http://tinyurl.com/3b7lxt. More details in the Storage blog http://gridpp-storage.blogspot.com/. 5) The ATLAS FDR information page for GridPP sites is providing a useful summary http://www.gridpp.ac.uk/wiki/AtlasFdr1. Does such a page exist for CMS or LHCb and if not would one be useful for their respective challenges? 6) There again seem to be instances of SAM critical tests failing at sites where it may be the test itself not the site at fault. The observed instances are being followed up. 7) Questions have arisen in the last week about use of pooled sgm and prod roles and the appropriate configuration at sites. Meanwhile discussion is ongoing about how to prevent T1 resources being used by (ATLAS) user jobs. 8) A security incident was reported at one INFN site last week. So far no UK sites have reported any linked concerns, but the available information on the incident is sparse. 9) A gLite-WMS migration strategy was discussed at the last DTEAM meeting. UIs will not provide a major problem as they can support both the LCG and gLite implementations simultaneously. A parallel service will be run at each of the existing providing sites for about 3 months. After this the LCG machines will be used to provide additional WMS resources. One minor issue is the need/recommendation to host the LB on a separate machine. TD reported that there had been an incident at Glasgow which had caused problems with the WMS and CE plus the compute element functions. 10) The deployment team membership for GridPP3 has been discussed. There is general agreement that this should include one representative from ATLAS, CMS and LHCb (they already attend). Representatives of other VOs and technical experts (such as from T1) will be affiliated and invited to attend specific meetings/discussions of relevance. Core members will be expected to attend the weekly meeting. 11) Some sites have been asking about the timelines for the GridPP3 hardware money to become available - you will recall that the allocations were agreed some months ago. The current position is that STFC have frozen the grants pending further review of the current issues being faced by the council. TD reported that there was no formal statement as yet - we were hoping to receive something by the beginning of March, following which there would be a three-week consultation process. Meetings: A) There is an ATLAS jamboree this Wednesday: http://indico.cern.ch/conferenceDisplay.py?confId=22132#2008-02-27. B) There is a WLCG GDB next week: http://indico.cern.ch/conferenceDisplay.py?confId=20227. The pre-GDB will be used to review the February CCRC: http://indico.cern.ch/conferenceDisplay.py?confId=29170. Derek Ross will be reporting a site's perspective on behalf of RAL T1. C) Not all sites have responded to the WLCG workshop funding request for the meeting in April (more agenda items now online http://indico.cern.ch/conferenceTimeTable.py?confId=6552). However, some sites have more than one request. Tony has helped us secure a block booking in the CERN hostel which will help keep costs down. C) The next GridPP User Board meeting has now been rescheduled to 19th March at 14:00. SI-4 LCG Management Board Report --------------------------------- There was nothing to report. SI-5 Documentation Officer's Report ------------------------------------ It was noted that SB was unavailable today. REVIEW OF ACTIONS ================= 277.2 DN to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. DC reported that the document would be sent out this week. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. SB was leading this with inputs from SP, SL and JC where needed. A new simple summary was required of all areas available plus a lookup/links facility, for the OC to review. This would include a list of most recent types of problems (possibly a 'top 12' for users - what the error means and the course of action to follow). SB to progress this. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 289.2 DC to check current situation regarding gLite WMS and SL4 - current status to be conveyed to DTeam. Done, item closed. 290.1 JC to write-down membership of DTeam. Currently being done. Item closed. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). 290.9 Quarterly Report for Tier-2 staff to be compiled by the Production Manager. 290.10 TD as Technical Director to provide a report showing effort figures; milestones & metrics; and a table of posts showing Technical Support. 290.11 DB to progress the situation at Manchester. 290.12 GP/SB/DC to define the portal and documentation Support posts and ensure they form a comprehensive basis for user support (both documentation and Grid access assistance), overseen by the UB Chair. 290.13 DB to complete the document re Reporting and Reporting Routes relating to staff, and circulate it, thereafter it would be posted on the website as a record. 290.14 RM to circulate the EGI Workshop Agenda. Done, item closed. 290.17 Re the Project Map, SP would look at the EGI wiki, and NG would consider more inputs relating to box 6.2. Done, item closed. 290.18 Regarding the LCG box on the Project Map, SP to iterate with TC and bring this issue back to the PMB. 290.20 RM to provide more detailed figures on travel expenditure - broad-brush percentages would assist with decisions re travel in GridPP3. 290.21 SS to hand-out travel forms at Dublin ('overseas' claim on web to be submitted as 'actuals' and should be submitted before the end of March 2008). Will be done. Item closed. 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. 292.1 TC and JC to iterate regarding the CERN system that recorded service interdependence and enabled them to recover from crisis events. 292.2 JG to review the interplay between Footprints and GGUS tickets on the helpdesk. 292.3 AS to produce an order for the CASTOR instances to be brought back. This is not really required in advance, will be dealt with on a case-by-case basis as required. Done, item closed. 292.4 JC to use the template from the disaster planning and apply it to the RAL power failure. ACTIONS AS AT 25.02.08 ====================== 277.2 DN to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. DC reported that the document would be sent out this week. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. SB was leading this with inputs from SP, SL and JC where needed. A new simple summary was required of all areas available plus a lookup/links facility, for the OC to review. This would include a list of most recent types of problems (possibly a 'top 12' for users - what the error means and the course of action to follow). SB to progress this. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 290.4 AS and JG to iterate regarding what could replace the Tier-1 Board. 290.7 AS to provide numbers in the Quarterly Report for the Tier-1 as per the ones provided for Tier-2. 290.8 AS/SP to iterate regarding the financial summary in the Quarterly Reporting (eg: Outturn figures). 290.9 Quarterly Report for Tier-2 staff to be compiled by the Production Manager. 290.10 TD as Technical Director to provide a report showing effort figures; milestones & metrics; and a table of posts showing Technical Support. 290.11 DB to progress the situation at Manchester. 290.12 GP/SB/DC to define the portal and documentation Support posts and ensure they form a comprehensive basis for user support (both documentation and Grid access assistance), overseen by the UB Chair. 290.13 DB to complete the document re Reporting and Reporting Routes relating to staff, and circulate it, thereafter it would be posted on the website as a record. 290.18 Regarding the LCG box on the Project Map, SP to iterate with TC and bring this issue back to the PMB. 290.20 RM to provide more detailed figures on travel expenditure - broad-brush percentages would assist with decisions re travel in GridPP3. 290.23 AS/JC to iterate on the Disaster Recovery template and remove capturable items that were considered to be minor. 290.24 JC to progress his suggested template to use when a crisis occurs - to be revisited subsequently at a PMB. 292.1 TC and JC to iterate regarding the CERN system that recorded service interdependence and enabled them to recover from crisis events. 292.2 JG to review the interplay between Footprints and GGUS tickets on the helpdesk. 292.4 JC to use the template from the disaster planning and apply it to the RAL power failure. 293.1 Re HPC thematic issue invites: it was agreed that PC will raise the issue with UKQCD; TD would contact Andrew Mcnab re GridSite; NG will contact NGS side to see if any interest. 293.2 A PMB document to be written for the OC regarding NGI metrics, and SP would provide some metrics for this. 293.3 TD to contact Trish Mullins and appraise her that an Agenda item relating to NGI metrics was planned for the F2F. 293.4 NO to re-send poster requests. 293.5 TD to reply to Alex re speaking at the industry workshop. INACTIVE CATEGORY ================= 271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN link, in one year's time, when actual data on breakages is available. Due date would be September '08. 271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the issue and collate info so that the PMB have something to revisit in one year's time. Due date September '08. It was noted that PC would circulate a revised document after discussion with ATLAS (RJ/PC/DN to iterate). 282.8 RM to monitor how R-GMA and networking issues impact on GridPP as matters progress. RM advised that this item should be moved to the 'inactive' category as it will develop over the coming months. RM discussed the issue with Steve Fisher and advised that support of R-GMA is required whilst APEL is dependent on it. RM reported that he has spoken to SF and there is currently no change to the R-GMA situation - process ongoing. 290.19 DB/SP to progress the details of the Project Map over the next few months, cross-checking that all elements are incorporated, including strategic priorities and staffing. To be completed before the next Oversight Committee. The next PMB would take place on Monday 3rd March at 1:00 pm. The meeting closed at 2:15 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager