JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  May 2010

UKHEPGRID May 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 388th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Mon, 31 May 2010 10:34:17 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (53 lines) , 100524.txt (272 lines)

Dear All,

            Please find attached the GridPP Project Management Board
Meeting minutes for the 388th meetings.

 The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
Dept of Physics and Astronomy                Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________































GridPP PMB Minutes 388 (24.05.10) ================================= Present: John Gordon (Chair), Andrew Sansum, Tony Doyle, Jeremy Coles, Glenn Patrick, David Kelsey, Sarah Pearce, Steve Lloyd, Tony Cass, Robin Middleton (Suzanne Scott, Minutes) Apologies: Roger Jones, David Britton, Tony Cass, Pete Clarke, Dave Colling, Neil Geddes 1. Feedback from PPRP ====================== DB had circulated a note regarding this. The summary was that the PPRP had taken on board the 5% cuts, but this may not be enough. The situation was pending at present and it would be discussed again at the upcoming OC meeting. 2. Oversight Committee ======================= It was noted that the OC meeting was on 18th June. SP was not planning to attend. DB, SL, JG & TD would be there. No Agenda had been received as yet. 3. Data Jamboree ================= JG had provided an Agenda for this. JG noted he had hoped for better UK attendance but this wouldn't be possible due to the cap on numbers. >From the UK, JG, Jens Jensen, Shaun de Witt, Matthew Viljoen, Wahid Bhimji and Sam Skipsey would be there. JG noted that this was a long term issue in any case, and it wouldn't affect things at present. 4. wLCG Workshop at Imperial ============================= DC had publicised this to the UK. There had been a PMB decision by email to encourage people to attend, and they would be funded to do so. On the Wednesday, the first session on issues from T1/2 and experiments is to be organised by the UK. It was agreed that JC would contact Jamie Shiers and find out how we could help with session planning etc. ACTION 388.1 JC to contact Jamie Shiers re the wLCG Workshop at Imperial, and find out how we could help with first session planning and/or provide a Chair for the session. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: ------ 1) FY09 procurements: - Disk servers from FY08 lot 2 and FY09 lot 1 are moving into LHC service classes. We expect to have sufficient capacity in VOs non-prod service classes by 1st June to meet the MoU commitments. - Second lot of FY09 disk servers had problems during the supplier proving test. Supplier resolved the problem with firmware updates and demonstrated 1 week of stable operation. We accepted the servers into our own 28 day acceptance test and are currently testing. Some indication of further problems - a DM review meeting will be scheduled to review this week. - Second lot of CPU servers is proceeding through acceptance and is expected to complete successfully this week. 2) FY10 procurements - PQQ stage of the disk tender is being evaluated. Delivery target is December. - CPU PQQ is nearly finalised and is planned to be submitted this week. 3) We have had a couple of unusual disk server crashes. See: https://www.gridpp.ac.uk/wiki/RAL_Tier1_Incident_20100515_Disk_Server_Outage These have been operationally disruptive owing to the length of downtime (precautionary to retain data). We are investigating the underlying problem, but have also reviewed our disk server crash process in order to improve turnaround on future failures. A recently received firmware update failed to resolve the problem. 4) Commissioning of the extra RAL site 10Gb/s link is ongoing. Currently implementation has been completed on the failover part of the production network and is being tested. 5) Commissioning of the second, resiliant 10Gb/s OPN link to CERN is ongoing. A fault was traced to a problem in London.   Service: ------- Other than the disk server failures, operations continue to be good. Network rates are gradually climbing but continue to be unproblematic. 1) The weekly operations summary is at: http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2010-05-19 2) SAM test availability for the ops VO was unreliable last week owing to false positives against our site BDII from Taiwan.    This is not going to be fixed as this infrastructure will be phased out in June. 3) Load related problems on the ATLAS software server continue and we are working on a temporary solution (faster server)    that we will deploy. Longer term we are considering the use of AFS. 4) We expect to complete the deployment of SCAS/glexec this week. 5) Oracle patching of databases. Will lead to "At Risks" on OGMA (Atlas 3D) on Tuesday 25th May,    LUGH (LHCb 3D & LFC) Thursday 27th May and SOMNUS (LFC, FTS) on Wednesday 2nd June. 6) The phaseout of SL4 is scheduled to complete in August - announcements have been made. SI-2 ATLAS weekly review & plans --------------------------------- RJ was absent. SI-3 CMS weekly review & plans ------------------------------- DC was absent. SI-4 LHCb weekly review & plans -------------------------------- GP reported as follows: 1. Diskserver gdss380 went down twice recently at UK Tier 1 - 14 May and then 22 May. See item 3 in Tier1 report. 1.1. First failure caused quite a few LHCb user jobs to fail. Second failure seems so far much less problematic. 1.2. Hopefully improved procedures in the future about reporting to the VO of diskserver failures. 2. PIC power failure on Friday brought down LHCb grid job accounting again. 3. Continuing problems with uploading data out of Sheffield, Brunel, Liverpool, Bristol and Glasgow. See also item 1 in Production manager’s Report. 3.1. Problem alleviated in Glasgow by firewall tweaking, but still exists. 3.2. This issue exists only in these 5 UK sites of those that LHCb runs on in the worldwide grid. 3.3. There are some indications that this upload issue may have temporarily overloaded the lhcbFailover space token at CERN. 4. LHCb has updated (increased) the time needed per job in the VO-card. Requesting all sites to provide long enough queues if they already do not do so. 5. Problem with bdii at SARA (wms.sara.nl) which froze at a moment unfortunately co-incident with the time lcgce07 was being brought up after glexec updates. 5.1. All queues on lcgce07 were considered available for LHCb. 5.2. Jobs failed at RAL as they ended up in low memory queues. 5.3. Problem solved by restarting the SARA bdii. SI-5 Production Manager's Report --------------------------------- JC reported that generally things were running smoothly. Items to note were as follows: 1) The problems affecting LHCb transfers continue but progress has been made. In particular there is a correlation between use of NAT and (high) failure rates when copying files from the WNs to remote SEs. When the remote SE is a DPM installation the transfers are successful! At a basic level this suggests problematic middleware implementations; at an operational level the pulling of files directly from the WN disk is not a recommended approach, but the sites affected continue to carry out more detailed tests to find workarounds. Oddly it is still only UK sites see this particular issue. The scope of the problem can be seen in these plots: http://hepwww.rl.ac.uk/nraja/UKUploadProblems/index.html. 2) There is a new call to support the staged rollout of new gLite 3.2 middleware. APEL gLite3.2 SL5 is in the list (on a related note, ActiveMQ-based APEL was recently certified: https://savannah.cern.ch/patch/?3612) 3) The UKI regional Nagios is validated: https://twiki.cern.ch/twiki/bin/view/EGEE/ExternalROCNagios. The latest schedule plans to switch off the central regional Nagios instance on the 15th June for all regions that are validated, the same date as the central OPS SAM tests will be switched off. SI-6 LCG Management Board Report --------------------------------- JG reported on the last meeting that had taken place on 11th May. The main issues under discussion had been DK's two revised security policies being approved; comparison of Nagios operations availability had been reviewed (this will happen again in June) - if everything is ok it will be switched off on 15th. JG summarised the RRB and LHCC meetings: the scrutiny group reported that since their estimates are within 10% of the experiment requests sites should use the experiment figures. JG reported that info needed to be gathered on hot data sets that were on disk. JG noted that a new full scrutiny was required by 1st September. It was noted that CERN management had sent a 'congratulations' round sites. JG reported that difficulties overall had been noted as Alice resources; delays to the Tier-0; EGEE to EGI progress; and long-term sustainability for middleware.. SI-7 Dissemination Report -------------------------- SL reported on behalf of SP that some changes might take place in relation to EGI personnel. REVIEW OF ACTIONS ================= 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. It needs writing up and an implementation plan. JC to progress. Done, item closed.   366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. Ongoing. 380.5 RM/SP to make changes to the EGI/NGI paper as discussed and bring back a revised version to next week's PMB. JG would check the numbers and circulate to the PMB - internal only. Done, item closed. 380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info will be needed after the PPRP. Ongoing. 382.1 RM to circulate updated paper (effort numbers, tables, text, NGI governance, risk etc) on EGI/NGI (DB to use to prepare slides for the PPRP). Ongoing. 383.1 JG to provide a note of expected procurement dates following the HAG meeting. Done, item closed. 384.1 AS to provide a plan for how to deal with the ADS Service, and bring back to the PMB. HEP data in the ADS had been greatly reduced but it was not obvious if the work to reduce it to zero would be cost effective. Ongoing. 384.5 ALL: to think about two levels of response to the NGS Technical Roadmap document: 1. endorse the general direction but correct any anomalies 2. ensure that the technical roadmap is aligned with GridPP's own aims and intentions A high-level response should be made - ALL to re-read the document and it will be discussed at the next PMB meeting. Ongoing. 384.6 TD/JC to take the lead on the response to the NGS Technical Roadmap document - we should devise our own response: GridPP to NGI document that addresses the forward-moving technical and other issues from a GridPP perspective - a skeleton outline should be circulated. Ongoing. 384.7 JC to organise a poll of sites to find out how they pick up on issues, what they currently check and monitor - was it a screen in the office, an auto-email etc? Done, item closed. ACTIONS AS AT 24.05.10 ======================   366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. 380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info will be needed after the PPRP. 382.1 RM to circulate updated paper (effort numbers, tables, text, NGI governance, risk etc) on EGI/NGI (DB to use to prepare slides for the PPRP). 384.1 AS to provide a plan for how to deal with the ADS Service, and bring back to the PMB. 384.5 ALL: to think about two levels of response to the NGS Technical Roadmap document: 1. endorse the general direction but correct any anomalies 2. ensure that the technical roadmap is aligned with GridPP's own aims and intentions A high-level response should be made - ALL to re-read the document and it will be discussed at the next PMB meeting. 384.6 TD/JC to take the lead on the response to the NGS Technical Roadmap document - we should devise our own response: GridPP to NGI document that addresses the forward-moving technical and other issues from a GridPP perspective - a skeleton outline should be circulated. 388.1 JC to contact Jamie Shiers re the wLCG Workshop at Imperial, and find out how we could help with first session planning and/or provide a Chair for the session. INACTIVE CATEGORY ================= 359.4 JC to follow up dTeam actions from the DB, as follows: ------- 05.02 JC/dTeam to try and sort out CPU shares and priority resources, at Glasgow first (perhaps by raising the job priority in Panda). ------- The next PMB would take place at NOON (12:00 pm) on TUESDAY 1st June.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager