JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  February 2011

UKHEPGRID February 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 414th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Wed, 9 Feb 2011 16:10:48 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (73 lines) , 110131.txt (281 lines)

Dear All,

Please find attached the GridPP Project Management Board Meeting minutes
for the 414th meeting.

   The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
School of Physics and Astronomy              Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________



















































GridPP PMB Minutes 414 (31.01.11) ================================= Present: Dave Britton (Chair), Sarah Pearce, Andrew Sansum, Steve Lloyd, Robin Middleton, John Gordon, Jeremy Coles, Pete Gronbech, Pete Clarke, Glenn Patrick, Tony Doyle, Roger Jones, Dave Kelsey (Suzanne Scott - Minutes) Apologies: Tony Cass, Dave Colling, Neil Geddes DB began by thanking SP for her input and contributions to GridPP over the years both within and outwith her Job Description! Contributions had been collected for SP as a token of the community's appreciation. SP thanked everyone for 7 enjoyable years, especially because of the people in the Project, which had come a long way since 2004. 1. Agenda for GridPP26 ======================= DB had circulated a draft Agenda with the overarching theme of 'efficiency'. The opening session would comprise a talk from DB, then PG would outline GridPP4. The third opening talk would be from ATLAS, Graeme Stewart had been invited to give this as a keynote talk. Was RJ happy with this? RJ noted yes. The 2nd session comprised complementary talks from the experiments, although Alice was yet to confirm. (Note added: ALICE is now confirmed). The 3rd session of the first day traditionally comprised a discussion session, and this year it would be on the GridPP4 Tier-2 algorithm. DB advised that the algorithm had to be discussed at the forthcoming F2F in Lancaster, with input from the experiments. PG asked if the algorithm could realistically be changed by the time it was discussed at GridPP26? DB noted a number of possible outcomes: acceptance, modification, starting date and length of run time; dissemination of funds and dates - we would need to have this discussion. PG noted that if the start was April 1st, there was only one day's leeway. DB advised that if modifications were required then the start date would not be 1st April. SL noted that we didn't want a long period of changes re the algorithm. PG emphasised how important it was that sites were treated equally - at the moment, regarding SL's proposal, this looked similar to last year, and could change with experiment input. JC considered there should be a period of time after the announcement prior to implementation. PG asked what did ATLAS want from sites? Did they want five equal sites which shared the load equally, or did they want a disparity in size from large to small? RJ advised that they needed a core of well-supported sites, and what we have currently worked well - it was a group of large sites. PG pointed out that it was difficult for small sites to improve, and they didn't do well from the algorithm. DB noted that we don't want a lot of sites that are the same size - a hierarchy of sites is preferred. SL noted that feedback had already happened to some extent which had created what we have currently. DB noted that we tried to design the support according to the best view of the experiments - and this was a set of well-run larger sites, with other sites prepared to assist as necessary. The algorithm would need to be discussed first at Lancaster, it could be published then and confirmed at GridPP26, with a view to starting on a named date. If we were not ready by Lancaster then it could be published at GridPP26. JG asked about CB involvement? SL considered there was no reason to involve the CB. DB noted that the CB view would be from a different angle and at present our view was to best fit the experiment needs. DB noted in any case that an information summary meeting was due for the Collaboration Board in order to wrap-up GridPP3 and start GridPP4. DB noted that the immediate step was for RJ and DC to give input to SL. SL would make progress on this before the F2F at Lancaster. There would be a one-hour discussion session allowed for the algorithm at GridPP26 and then a storage discussion. Day two of the Collaboration Meeting would commence with Tier-2 reports from the Tier-2 Co- ordinators, from a site perspective, followed by a discussion session around three themes, with main points brought up. There would be a guest speaker re support and ticketing, and the final area to be covered was data transfer. The final session on day two would focus on the Tier-1: AS and Gareth would probably present. DB wanted relevance to the experiments, the project overall, and the Tier-2. PG noted that some of the Tier-2s would lose manpower in GridPP4 and they may not be keen on change. There ensued a discussion on Quattor and fabric management. DB noted that when he contacts the Tier-2s, he can ask them to think about fabric management tools. JC and AS should approach this issue in a relevant way within their sections/talks. DB asked for suggestions regarding the last talk of the meeting. AS suggested it should focus on vision and objectives, as at the beginning of GridPP3 - where will we be and where do we want to be by 2015? DB considered this to be a good idea and suggested it be extended to the Tier-1 talk - where did the Tier-1 go in GridPP3, how did it evolve - Quattor, monitoring etc happened within that time. Given the meeting context was 'efficiency' then the question could be asked: where do we want to be in 2015 at the end of GridPP4? AS thought this vision was useful whilst we were moving through the Project to each next checkpoint. DB noted 'vision 2015' - we needed a vision statement from the Tier-1, sites, and the experiments. TD asked if we also needed the CERN view? PC also thought that upgrade proposals might be useful? DB would think about the last session in terms of 'vision' for the future. RJ suggested that Roger Gough could be invited from DELL. 2. Project Management Transition conclusion ============================================ SP reported that things had gone well, they had regular meetings and covered all areas they had wanted to cover. PG had not yet done a budget, so questions were anticipated come the time and SP would be available to assist. The Quarterly Reports, Project Map, personnel reporting, were all under control. PG thanked RJ and JG for their reports, he would be able to finalise the Quarterly Reports soon. PG advised that he was still awaiting the report from CMS, and he also needed to look at the manpower spreadsheet - the Tier-1 was the most complex and PG would meet with AS to get the background to the current situation. PG asked if it might be easier for DC to delegate the Quarterly Report? SP agreed that it would be good if DC could delegate the CMS Quarterly Reporting. DB noted this would be discussed at Lancaster - lightweight but timely Quarterly Reporting would be required in GridPP4. 3. Project Management Issues ============================= PG had circulated a Project Map. DB outlined the history and the changes (versions) of this. We could re-arrange the current one. There ensued a discussion on finances and layout. DB suggested moving 'Grid Operations' to Work Package B, and moving Work Package A next to the Experiments Tier-1. It was re-iterated that just because there were two boxes per experiment, did not mean two separate reports. A single report could incorporate all tasks. The Project Map was primarily a tool for the Project Manager. PG agreed to try making the changes as suggested by DB and see how this worked out. 4. Publishing VO shares ======================== There had been an email discussion regarding publishing in GSTAT2. JC noted that sites wanted some element of freedom of reporting. It was understood they could do so in principle but not in the documentation? PG reported that the documentation was falling behind reality at present - the document needed updated and it was agreed that we should publish something that makes sense. JC noted that SL's spreadsheet was different at different stages of the project - the hardware allocations should be taken into account, also, the Alice figures weren't correct. It was agreed that JC should publish according to SL's spreadsheet as discussed. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric:    1) FY10 procurements - Disk tender - accepted! - CPU tender - all delivered. Acceptance testing has started on V10 (scheduled to complete 8th Feb). CL10 problems resolved and expect them to complete, supplier proving test this week. - Tape drive and media purchase still outstanding, waiting for hardware availability. Expect to finalise plan early this week. 2) The removal of the SL08 disk servers is complete (reported verbally last Monday). Agreed plan of action with supplier. Load test did not start last week as planned - increasing priority of work. Service: A quiet week operationally. 1) Summary of operational issues is at:     https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-01-26 2) We have commenced a 2 day downtime: - CASTOR database upgrade - network intervention to add new address space for CPU nodes and increase internal links - CMS disk servers upgrade to SL5 (64bit) - Batch server O/S update 3) Large queues of batch jobs built up last week, waiting for free batch slots. This was traced to a CE information publishing problem (its interaction with VO job submission). 4) Bad checksum files continues to be an operational problem. Manual deletions required by VO and emergency interventions by us to ungum tape migration. We will consider an emergency change to gridftp ASAP once we have a solution ready. SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) In the deployment team meeting last week there was a brief discussion of the GridPP4 accounting metrics to be used for Tier-2 hardware allocation, and the period that will be used for the assessment. Apparently a commitment was made at the Collaboration Board to publish the metrics “well in advance”. Please could we indicate to sites the timeline for sharing the metrics – discussion at GridPP26 (29th-30th March) for a period starting 1st April is rather late. 2) In relation to the publishing of VO shares issue, we have now received a GGUS ticket from the WLCG Information Officer (Flavia Donno) https://gus.fzk.de/ws/ticket_info.php?ticket=66564. The shares will be discussed at tomorrow’s deployment team & sites meeting. To a first approximation we will use the 2nd tranche hardware allocation figures. Is this acceptable to the PMB providing the WLCG Tier-2 per VO pledges are met? 3) A VOMS intervention at CERN last week was unsuccessful leading to the server supporting the ops VO being down for longer than the original 2hr downtime. ops proxies are for 4 hrs so the concern here is that globally site availability/reliability metrics would have been affected. Does the MB proactively correct for this sort of effect? Fortunately last week the server returned just before the (UK ops) proxy expired. 4) Sites continued upgrades to their ATLAS Frontier squids last week amid concerns about the level of customisation in the rpm and lack of documentation provided for the installation. Several sites broke their services. A savannah request was submitted to request improvements in the documentation. 5) Two additional issues/concerns around GOCDB4 have been raised. It is no longer possible to tag site services as pre-production or test, therefore any site that is trying a new release will get ticketed for all resulting problems seen in the monitoring (many would be seen quickly by the site anyway) and these have to be treated as normal tickets by the ROD team. That is, sites that are in the critical state cannot have their alarms closed, even if they are testing a release. The extra critical tickets then impact the region performance metrics. Was there a consultation ahead of such a change and will (or can) this happen for all central grid tools? SI-3 ATLAS weekly review & plans --------------------------------- RJ reported that the downtime for the Oracle upgrade for CASTOR, had affected ATLAS. They were due to perform a local file catalogue upgrade due to this as well, probably in February. There would be a 6-hour outage that will take down the cloud, then in August they may move over to a central LFC system, the UK backup needs additional licences so this would not be a priority. A full instance for the TAG database on the LFC was preferred (there was no timescale at present as the tools were not available) - they would need to look at the pros and cons for UK operations. RJ reported on another issue re software installations, at RAL all was good/green but they were missing software releases. A ticket was in for this. The installations were being done by hand. If there were many releases, and production work goes from the Tier-1 to the Tier-2 and fails, it can be because the Tier-1 have not upgraded the software release. MC production was proceeding with heavy ions pending. RJ advised that there were issues on certificates; the Quarterly Reporting had been late due to changes onto the new dashboard - the information was incomplete and inconsistent, so they had moved back to the old dashboard but this was being discontinued. The metrics previously used were not so reliable, therefore metrics were amber in the Quarterly Reporting. The ATLAS production dashboard had its own problems internally but was inconsistent with the wLCG figures for the quarter. SI-4 CMS weekly review & plans ------------------------------- DC was absent. SI-5 LHCb weekly review & plans -------------------------------- GP reported that the queued jobs issue had now been resolved; it had been a good week; they were tidying up their disk space. SI-6 User Co-ordination issues ------------------------------- GP noted there was nothing to report. SI-7 LCG Management Board Report --------------------------------- DB noted there had been no meeting. SI-8 Dissemination Report -------------------------- SP noted there was not much to report, LHC@Home was moving from QMUL back to CERN. AOB === DB reported that he was due to have a 'phone meeting with Tony Medland this afternoon. Funding was due to be released for the rest of GridPP4, however the Tier-2 hardware funding might be an issue. The RAL staffing could be finalised. DB would email details of the meeting or he would update the PMB at the next meeting. Next Monday's PMB was CANCELLED: there would be NO meeting on Monday 7th February. DB noted he was not available the following week, 14th February, so if possible JG could Chair. This meeting may also be cancelled due to the upcoming F2F at Lancaster. ACTIONS AS AT 31.01.11 ====================== 398.7 Re the GridPP Security Policies - DK advised that EGI formal signoff had now been given, he would update the GridPP website pages. 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. 409.1 JC to revisit document with a GridPP-NGI-NGS structure, not Dave Wallom’s. JG will provide input. Visions for today and for the future. 409.2 GP to produce new role description for the Chair of the UB. 411.1 DB to organise an Agenda around the theme of 'Efficiency' for GridPP26 at Sussex. 411.3 SL to co-ordinate with RJ, DC, and GP, regarding monitoring site performance and distribution of GridPP4 funds, and provide a draft document to which the PMB could respond. This should be finalised at the F2F meeting in March, in relation to how much money was to be allocated. We would need a starting point by the F2F in February. SL was awaiting input from RJ and DC - they need to respond ASAP. 412.3 JG to check with AS and RJ re the issue of the Tier-1 continuing to provide LFC services (the issue here was extra effort, a proposal was required). 413.1 RM to check the travel budget in relation to contributing to the costs of being involved with the Royal Society Summer Science Exhibition, in conjunction with Birmingham/Cambridge. 413.2 DB to contact Karl Harrison and confirm GridPP's involvement in the Royal Society Exhibition, noting a contribution in terms of a possible demo, manpower, and promotional materials. 413.3 JG to find out at the EGI meeting today if there was a GOCDB4 failover still in existence (the last one ended with EGEEIII). 413.4 Regarding GSTAT2 publishing and sites filling-in the numbers as per SL's spreadsheet table showing the fraction (ie: publish the theoretical model in GSTAT) - PG to send the relevant spreadsheet to JC so that dTeam could progress this.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager