JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  June 2011

UKHEPGRID June 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 428th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Mon, 13 Jun 2011 11:17:33 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (84 lines) , 110606.txt (272 lines)

Dear All,

Please find attached the GridPP Project Management Board Meeting minutes
for the 428th meeting.

   The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
School of Physics and Astronomy              Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________






























































GridPP PMB Minutes 428 (06.06.11) ================================ Present: Dave Britton (Chair), Dave Colling, Jeremy Coles, Pete Gronbech, Robin Middleton, Glenn Patrick, Dave Kelsey, Steve Lloyd, John Gordon, Pete Clarke, (Suzanne Scott - Minutes) Apologies: Tony Doyle, Roger Jones, Tony Cass, Andrew Sansum, Neil Geddes 1. AHM Paper status ==================== DC reported that he was still awaiting info. from the Tier-1 and ATLAS. DB advised that RJ was interviewing today and would not be present, although he may have time afterwards to deal with this. DB would also check status with AS. JG thought that the deadline might be extended. DC advised that he needed a couple of paragraphs only from each, so that he could pull things together to provide a couple of pages of text. ACTION 428.1 RJ and AS to respond to DC regarding inputs for the AHM paper. 2. Speakers for ACAT conference ================================ DB reported there was an Advanced Computing & Analysis Techniques (ACAT) in Physics Research happening at Brunel in September. DB advised that this was a better opportunity for GridPP than the AHM to get publications - there was a procedure for refereeing the papers and they would be published with a good impact-factor. DB was organising part of the conference and wanted to identify people to speak. Did we have a list of submissions from GridPP to the AHM? None were known. DC noted there was one on CMS and clouds. JG noted that Jens had a few papers lined up. DB advised that this material could be re-used for ACAT. DB asked if there were any other people we could contact? DC noted a new person dealing with ganga at Imperial - he would look into this for Track 1. ACTION 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. DB advised that the deadline was 2nd July. DB noted that within Track 1, grid and cloud computing was a broad area. Was a security talk possible? DK indicated it was possible. DB thought that more general talks might also be possible, however more targeted talks would be good, eg: adaptive data placement. It could be a more GridPP-focussed talk with an emphasis on networking. Then there were new architectures, many core - possibly Dave Newbold or Simon Metson, or Phil Clark might be suitable. JC noted Andrew Washbrook as well. DB asked about virtualisation - was there someone appropriate at the Tier-1? JC noted Martin or Ian Collier. The Tier-1 was doing a fair bit of virtualisation of infrastructure. JG agreed to forward DB's email to the Tier-1. DB noted other topics also, which were less related to GridPP and he asked if PG could consider something on monitoring? PG would look at the topics and see if anything were possible. DB noted that Track 2 was data analysis, algorithm and tools, with a subset list. These more naturally fell under the experiments' brief rather than GridPP. The third Track was computation in theoretical physics, which was probably outwith our remit. DB emphasised that we could take the opportunity to submit abstracts. 3. Accounting - HS06 etc ========================= JG advised that Alessandra Forti and Martin Bly had been skeptical about the published figures. SL agreed that there was a general feeling that the figures were not correct, which was borne out by his measurements. PG commented that we all knew that HEPSPEC produces a better result on SL5 64bit systems compared with SL4 32bit, and some sites may not have re run the benchmarks after the upgrade. There ensued a discussion on HEPSPEC, sites, and CPU. It was agreed that HEPSPEC was not a proper benchmark of ATLAS code. SL emphasised that machines were all different, and HEPSPEC took some combination of CPU Speed, memory, IO etc into account but apparently not the right combination for ATLAS code. SL noted we could get the production right at least. DB noted that for ATLAS, using results from production jobs would be the easiest thing. DB summarised the PMB view that HEPSPEC06 was not helping - using production jobs to get empirical numbers was the best way to proceed pragmatically. DB reported that he had been discussing Lancaster with RJ. The issue was still being investigated, however RJ had reported that waiting jobs were not an issue, as this depended on Panda. The peak number of jobs was the more interesting issue, and RJ was looking into this. DB reported that the Glasgow team were going to let DB know what jobs they were receiving from other clouds - this was currently under investigation. RJ thought that the issue of Lancaster not being full probably rested on several reasons internal to ATLAS - the Panda system operated in a particular way and there was the internal issue of Panda brokering. RJ also wanted to measure resources available, not just resources used. The Glasgow cloud issue and the Lancaster capacity number were to be continued. DC noted his disagreement of using 'resources available'. DB noted that the issue was internal to ATLAS - they knew globally that they were not using the resources that were there, due to the issue of Panda brokering. This had nothing to do with sites not providing resources to ATLAS. 4. AOCB ======== - networking DB reported that David Salmon had sent notes and slides from the network meeting that had taken place in Paris. There was a specific request to GridPP to check the situation with respect to the Tier-2s: 1. check whether UK Tier-2 resources were on well-defined sub-nets within the universities; 2. ask Tier-2 sites to monitor traffic levels in and out of the Tier-2 resources ACTION 428.3 JC to compile an info list relating to sub-nets at sites. DB asked if it were possible to measure the traffic volume in and out of the Tier-2s? This was about co-existence with different resources in Europe. DB advised that everything was under control at this point and there was no proposal to do anything, however there was a need to keep an eye on things. PC asked why the Network Document was not sufficient for David's purposes? It provided at least 60% of what he needed to know? DB advised that they were asking us to measure volume. JG noted that we could monitor the FTS but Tier-2 to Tier-2 traffic was difficult as there were many kinds of dataflows. DB agreed that it would be overkill to do this for every site, but some of the larger sites could provide useful information. DB asked JC to find out if there were an easy way to measure this, was any monitoring already in place? DB noted that overall this was a longer term issue and that we couldn't commence a huge programme of work, however we could compile some info just now. PC suggested that the timescale for this should be the GridPP Collaboration Meeting at CERN in September. This might be trivial to do at Glasgow, which had a separate cluster, and we could limit it to sites that were similar. The lowest level of detail was total traffic, beyond that, it depended how difficult the monitoring would be. ACTION 428.4 JC/PC to ask through the Ops Team or HEPSYSMAN whether or not there was an easy way to measure Tier-2 traffic, and to find out what was possible at Tier-2 sites. PC asked that David Salmon be reminded of the Network Document, which did contain the bulk of information which he required. DB agreed to follow this up. ACTION 428.5 DB to contact David Salmon and appraise him of the Network Document which had already been produced and contained our 'best knowledge' at present. He would also advise DS that we would progress his request and see what we could provide in terms of traffic measurement. - Resource Meeting GP reported that the issue of extra disk had arisen at the Resource Meeting - he would need to ask AS about this. ACTION 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS was not present. SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) There was an update of the UK VOMS that led to T2K job failures (proxy problems) during the “at risk” period. T2K are also suffering due to jobs exceeding queue memory limits. On the topic of Steve's observed HS06 spreads seen across the sites many of you will have read the comments on TB-SUPPORT. In particular Martin Bly's remarks about the nature of the current environment leading to distortions: "the prevalence of 64bit over 32bit since we did the original tests, the I/O regime in which the tests are performed, changes to the code bases, to name some. I suspect that I/O regimes will make the greatest difference to events/HS06 for two otherwise identical nodes" and Alessandra Forti's comment about the test (and user) jobs being directed to slower nodes in the cluster (and the impact of fairshares). SI-3 ATLAS weekly review & plans --------------------------------- In absentia, RJ reported briefly as follows: ATLAS Status: Tier-1 - Testing xrootd queue at RAL - Questions about the number of concurrent jobs running at RAL form our side – does this sound familiar?! We may need more pilots at RAL. - Frontier server switching from PIC to Lyon. - Cernvmfs testing is going well. ATLAS Status: Tier-2 - Minor T2 issues. Four more sites up for T2D sonar tests. All look OK on current tests. SI-4 CMS weekly review & plans ------------------------------- DC reported minor problems at the Tier-1 in relation to job and disk pools; generally everything had been ok over the last week. For the Tier-2, all of the UK had been at 100% (not Bristol), SAM tests were Nagios-based, there were some differences as a result. SI-5 LHCb weekly review & plans -------------------------------- GP reported as follows: 1) A backlog of jobs (~4500 jobs at peak) built up at UK T1 over the week with its peak on Thursday. For various reasons, (batch farm full, flickering publishing in bdii - possibly Cream issue?) RAL was not picking up LHCb jobs. Moved to direct submission of jobs to lcgce09 on Friday and since then the backlog has almost been eliminated (~250 jobs on Monday morning). 2) RAL share of new data set to 0 until the backlog was eliminated. Expect it to be increased this week again. 3) Added 6TB diskserver on Friday to lhcbRawRdst (d0t1) to help with above issue. 4) Large number of failures due to "input data resolution" mainly because of the time the jobs have been waiting - files have been garbage collected by Castor and will need to be restaged (being done automatically as needed). 5) Smooth running at Tier-2 sites. SI-6 User Co-ordination issues ------------------------------- GP noted nothing to report. SI-7 LCG Management Board Report --------------------------------- DB advised that the next meeting was tomorrow. SI-8 Dissemination Report -------------------------- SL reported that the Magic Cubes had arrived, and they had already been paid for. REVIEW OF ACTIONS ================= 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. Ongoing. 409.1 JC to revisit document with a GridPP-NGI-NGS structure, not use the document Dave Wallom produced. JG will provide input. Visions for today and for the future. Ongoing. 424.3: DB to contact ALICE-UK about Tier-2 resources. Ongoing. 424.6: DC to complete CMS metrics - DC would circulate this after the meeting tomorrow. Done, item closed. 424.10 DB to contact JG to suggest topics for CERN Meeting. Done, item closed. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. Ongoing. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. Ongoing. 427.1 Re Tier-2 accounting figures: DB to contact RJ and ask him to explain why there were so many jobs waiting at Lancaster, when they had such a large share available. Done, item closed. 427.2 Re Tier-2 accounting figures: DB to contact RJ and ask him about Glasgow getting production jobs from other clouds, when other sites don't. DB would also check with the Glasgow team. Done, item closed. 427.3 DB to circulate an email to the CB re the OC outcome and the finalising of GridPP3, and point the CB at the documents. He would advise that a CB meeting might be useful in around 6 months' time, after the accounting period. Done, item closed. ACTIONS AS OF 06.06.11 ====================== 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. 409.1 JC to revisit document with a GridPP-NGI-NGS structure, not use the document Dave Wallom produced. JG will provide input. Visions for today and for the future. 424.3: DB to contact ALICE-UK about Tier-2 resources. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. 428.1 RJ and AS to respond to DC regarding inputs for the AHM paper. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. 428.3 JC to compile an info list relating to sub-nets at sites. 428.4 JC/PC to ask through the Ops Team or HEPSYSMAN whether or not there was an easy way to measure Tier-2 traffic, and to find out what was possible at Tier-2 sites. 428.5 DB to contact David Salmon and appraise him of the Network Document which had already been produced and contained our 'best knowledge' at present. He would also advise DS that we would progress his request and see what we could provide in terms of traffic measurement. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. Forthcoming PMB meeting dates were as follows, at the usual time: Mon June 13th Mon June 27th Mon July 11th Mon July 25th Mon Aug 8th Mon Aug 22nd Mon Sep 5th TUE Sep 13th F2F@CERN Mon Sep 26th

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager