JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  June 2011

UKHEPGRID June 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 429th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Tue, 21 Jun 2011 15:04:35 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (85 lines) , 110613.txt (417 lines)

Dear All,

Please find attached the GridPP Project Management Board Meeting minutes
for the 429th meeting.

   The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
School of Physics and Astronomy              Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________































































GridPP PMB Minutes 429 (13.06.11) ================================= Present: Dave Britton (Chair), Dave Colling, Jeremy Coles, Pete Gronbech, Robin Middleton, Glenn Patrick, Dave Kelsey, Steve Lloyd, John Gordon, Pete Clarke, Tony Doyle, Roger Jones, Andrew Sansum (Suzanne Scott - Minutes) Apologies: Tony Cass, Neil Geddes 1. Accounting Issues ===================== DB suggested that we continue the discussion on accounting. SL reported that he had responded to a query from Mike Seymour but had not heard back yet, and had nothing to add since last time as there was no change. DB asked if there was consensus regarding Glasgow receiving from clouds? SL commented that other sites should also get work from elsewhere rather than stopping Glasgow getting jobs. JC asked about the issue of capping? DB advised that the PMB had agreed not to do that at the last meeting, but the issue could be re-visited if required. DB noted that ATLAS wanted a couple of large Tier-2s in each country connected to multiple clouds in order to help with load-levelling, and we should encourage sites to ask, but it was ATLAS' decision. SL considered that in relation to Glasgow there was not a very large effect generally, and he thought we should carry on as we are and check it again later. DB noted he couldn't tell what jobs came from where. JG advised that some sites had been complaining that there wasn't enough work. DB had looked back at the correlation, and this related to the oscillating nature of the job load in the UK generally compared with other clouds - it went from zero to peak - and this didn't seem to happen with other clouds. TD noted that Graeme Stewart could probably answer that. DB suggested that this could be discussed at the CERN meeting. DB advised that at STFC there was a change in the way capital would be funded, but that this shouldn't affect GridPP4. Tony Medland had said that the 'old' rules applied and that it didn't affect GridPP4. DB noted that the other outstanding accounting issue was that of Lancaster. SL reported that he had had no contact with RJ. DB advised that their resources weren't being fully utilised, relating to brokering of two VOs. RJ was to pursue this and sort it out in order to receive more jobs. PG commented that if the site were not being used then it was the site's problem as well as the experiments'. DB noted it was possibly a brokering system issue relating to the 'slow start' problem. PG advised that in general, sites should be addressing this and speaking to experiments etc, being proactive in response. RJ needed to report-back on the situation. JC advised that Lancaster also had other issues as well - in recent ops meetings they were often mentioned as an ATLAS site in the brokeroff state. JG noted they had publishing issues in gstat as well. DB advised that we would ask RJ in two weeks' time. RJ joined the meeting at this point. DB asked for an update on Lancaster. RJ reported that there were no developments in relation to ATLAS but they do seem to be flat-topping on ATLAS jobs. He needed to look at it further in order to understand this. RJ had provided comments re the 'other' VOs being on the older cluster. Lancaster were still not filling spots. DB noted that ATLAS was using QMUL resources - was Lancaster not set up correctly? RJ advised that the set-up of slots might be an issue. Panda brokering does not discover the power of the resources, it looks at job- slots only. RJ noted they were not filling empty jobs, but that the pilots were going through OK. SL advised that QMUL had 5,700 jobs running at present. DC also noted that IC had 1000 ATLAS jobs running recently at Imperial. RJ noted that this was brokered at the other side, and you couldn't simply pull-in from a queue. DB asked whether the current total of CPU should be used? RJ advised that it was usable, but just wasn't being used. RJ added that they also were not getting as much from LHCb as usual. RJ noted there was no inherent problem with running jobs, but it was rare that Lancaster was full. They also had other demands on the cluster. DB advised that it was two weeks until the next PMB. Could RJ sort out the issue that was preventing ATLAS from using available slots in that time? RJ said yes, he could try. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric:    1) FY11 procurements - EU tender for disk framework agreement PQQ stage being evaluated (eval meeting today) - CPU framework expected to go out shortly (running late but nearly ready)     2) SL08 remains out of production. - Concluded that original problem (lost raid set after single drive failure) resolved - Further problem with new drives not recognised, now understood to be inconsistent device driver update - now resolved and last 7 day test run to gain confidence - Outstanding question of 3*(multi drive failure) in May, but drive failure rate generally high in May (double)unknown cause at the moment. Plan to redeploy shortly into T1D0 service classes. Service: Generally operations running reasonably smoothly. 1) Summary of operational issues is at:     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-06-08 2) CASTOR * LHCB experienced problems where recalled files were garbage collected before used, caused thrashing of tape recall system. New garbage collect policy for LHCB is being trialled. * Expect to upgrade CASTOR tape servers to 2.1.10-1 to enable T10KC. No downtime required. 3) Databases * Minor update (at risk) to ORACLE configuration completed to resolve problem with Oracle statistics gathering. Staff: 1) Grid team leader post internal interviews within 1-2 weeks (being rescheduled from Wednesday) 2) Paperwork for four other vacancies submitted to STFC for approval has not been approved - * Two system admins for Fabric team * One CASTOR admin * One Grid Team member SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) EGI has released version 1.0 of the EGI Operational Level Agreement document: https://documents.egi.eu/document/31. The document covers the services a resource centre is expected to provide and the associated service levels. The main measures are: “1. The Resource Centre MUST be available (UP) at least 70% of the time per month (daily availability is measured over 24 hours).  2. Resource Centre reliability MUST be at least 75% per month.” 2) In the ops meeting last week most GridPP Tier-2 sites confirmed that they are on sub-nets within their university. The majority of site administrators have their own, or access to, useful site monitoring (mainly cacti or ganglia based) of network traffic. The topic of monitoring and site configuration is of widespread interest and will be explored further during site update talks at the HEPSYSMAN meeting at the end of the month. 3) At the GDB (http://indico.cern.ch/conferenceDisplay.py?confId=106645) last week the “Security futures” discussion indicated that the glexec discussion is likely to reopen over the coming months at first in the context of a working group being led by Jeff Templon and Markus Schulz. The technical discussion group will attempt to distil core issues and proposals in numerous areas where a more joined-up or simplified approach may benefit WLCG in the medium/long term. The immediate approach remains to use glexec and integrate this with the experiment frameworks. JG reported that he had a few contacts in relation to a private group to come up with solutions. If this covers too many different areas, it may be difficult to do it in any depth. JG noted that security experts would be at HEPSYSMAN. DB agreed that we had a significant interest in the security side of things. 4) The (provisional) Tier-2 reliability: availability figures for May (http://tinyurl.com/6fqwwc3) indicate problems at UCL-HEP (41%:28%) due to unresolved CREAM-CE problems; EFDA-JET (73%:49%) and Birmingham (87%:87%) which had disk/controller problems. SI-3 ATLAS weekly review & plans --------------------------------- RJ noted not much to report - not much news at the Tier-1, it had been ok over the past week or so. There had been air conditioning issues at Manchester, which were now fixed. They also had DPM problems, and there had been a squid problem. Things were generally functional: ATLAS were trying to do hammercloud tests, which were showing higher failure rates, they were helping with the configuration. Questions were being asked about future resource requests. DB asked about the ATLAS ongoing global resources for disk - in the UK at the end of the accounting period, funds needed to be spent. We also had unused disk. RJ advised that analysis jobs generally were drifting to the Tier-1s. If we placed more data at the Tier-2s then this would balance it out. The resource request from ATLAS was submitted recently. DB noted the issue of the forward look in 3-4 years as well. Prior to the pledge in October we needed to decide what we were doing with the limited UK Tier-2 resources generally. We were in a transition phase. SI-4 CMS weekly review & plans ------------------------------- DC noted that everything was positive at present - the Tier-1 was running well, the Tier-2 was in 107% readiness, and availability was great. SI-5 LHCb weekly review & plans -------------------------------- GP reported that last week there had been problems with RAL - the Tier-1 was set to nominal share, a new lot of data was needed. Garbage collection had also been an issue. PC advised that all of the Tier-1s were empty at present, but stripping jobs were due and the re-stripping of 2010 data would be commencing tomorrow. AS advised that the job start rate needed to be changed - it was inadequate at present. The change was done but was not yet permanent. He would warn the team about the stripping work which was imminent. PC advised that they may use the Tier-2s for re-processing in the future. They were also doing pilot work with Manchester in the UK for a few months. SI-6 User Co-ordination issues ------------------------------- There were no issues to report. SI-7 LCG Management Board Report --------------------------------- DB advised there had been a discussion about the timeline for the glexec report in relation to the identity federation workshop. JG noted that there had been two separate problems at RAL with glexec but that they had been resolved. JG advised that ATLAS had highlighted the poor level of support provided by the Netherlands Tier-1, which did not always respond. AOB === PG advised that he needed RJ and DC to assist with tightening up the metrics for GridPP4. DC confirmed that what PG now had was correct. PG would compile a template report and send this round. RJ advised that he was happy with the metrics, but less happy about their ability to measure them, due to changes in the dashboard. PG noted he was happy that the metrics had been agreed by all sides, so he would send out template reports for review. REVIEW OF ACTIONS ================= 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. Ongoing. 409.1 JC to revisit document with a GridPP-NGI-NGS structure, not use the document Dave Wallom produced. JG will provide input. Visions for today and for the future. Done, item closed. 424.3: DB to contact ALICE-UK about Tier-2 resources. Ongoing. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. Ongoing. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. Ongoing. 428.1 RJ and AS to respond to DC regarding inputs for the AHM paper. Done, item closed. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. Ongoing. 428.3 JC to compile an info list relating to sub-nets at sites. Ongoing. 428.4 JC/PC to ask through the Ops Team or HEPSYSMAN whether or not there was an easy way to measure Tier-2 traffic, and to find out what was possible at Tier-2 sites. Done, item closed. 428.5 DB to contact David Salmon and appraise him of the Network Document which had already been produced and contained our 'best knowledge' at present. He would also advise DS that we would progress his request and see what we could provide in terms of traffic measurement. Done, item closed. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. Ongoing. ACTIONS AS AT 13.06.11 ====================== 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. 424.3: DB to contact ALICE-UK about Tier-2 resources. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. 428.3 JC to compile an info list relating to sub-nets at sites. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. Forthcoming PMB meeting dates would be as follows, at the usual time: Mon June 27th Mon July 11th (doodle poll required - date not suitable) Mon July 25th Mon Aug 8th Mon Aug 22nd Mon Sep 5th TUE Sep 13th F2F@CERN Mon Sep 26th

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager