JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  November 2010

UKHEPGRID November 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 403rd GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Mon, 1 Nov 2010 09:27:18 +0000

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (65 lines) , 101025.txt (319 lines)

Dear All,

Please find attached the GridPP Project Management Board Meeting minutes
for the 403rd meeting.

   The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
School of Physics and Astronomy              Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________











































GridPP PMB Minutes 403 (25.10.10) ================================= Present: Dave Britton (Chair), Sarah Pearce, Tony Doyle, Jeremy Coles, Andrew Sansum, Steve Lloyd, Roger Jones, Glenn Patrick, Dave Kelsey (Suzanne Scott - Minutes) Apologies: Tony Cass, Robin Middleton, John Gordon, Pete Clarke, Dave Colling, Neil Geddes 1. GridPP26 ============ DB reported that there were problems with booking GridPP26 at Sheffield. The University student accommodation was not available and a hotel had been recommended instead, however the cost was prohibitive and for accommodation alone was around double our usual cost. There was also a snooker tournament due to take place in Sheffield at the same time as GridPP26, which would mean that other hotel accommodation would be difficult to source, and probably expensive as well. DB suggested that we think about going somewhere else - he could raise it with Manchester or we could try Sussex? DB noted that he had mentioned the possibility to Manchester in general terms a few weeks ago. Comments? GP advised that the Sussex campus was some way outside of Brighton. SL noted that it might be a good idea for him to approach Sussex in the first instance, as we had no contact with them outside of the CB. This was agreed. DB advised that it may be possible to be flexible about date, possibly for the end of March, but this was during term time and it would make accommodation unavailable. It was agreed that SL would contact Sussex and ask about our original dates of 18-21 April 2011. ACTION 403.1 SL to contact Sussex and enquire about the possiblity of them hosting GridPP26 in April 2011. 2. ATLAS adaptive data placement ================================= RJ advised of new data placement models within ATLAS - these were adaptive to the user needs. Accessing downloads as required had been trialled in the US and elsewhere, and it seemed to work, with the knock-on effect of reducing network traffic. The model resulted in more specific usage and data was only moved when it was needed. RJ noted that his initial concerns that there could potentially be a network issue, had been allayed, but problems were possible, however there was a request from ATLAS to do this in the UK. It was noted that Tier-2 usage was falling below capacity and with this model you could have multiple copies and make better use of the resources. DB thought it sounded sensible but asked why the transfers had to run as a user job at the Tier-1? RJ noted you shouldn't need to do that but that was how it had been implemented - it was a transfer from the Tier-1 to the Tier-2. RJ advised that the UK were the only people following the 'correct' model now but he suggested we do this. DB commented that it did open the door to users to submit any job to RAL. RJ noted yes, but the slots needed to remain open, so the jobs would be throttled-back - he believed this could be done technically. Graeme Stewart was an advocate of running users jobs on the Tier-1 but it did pose a risk to the organisation. TD asked if it could be limited to a subset of users only? RJ noted no, any user could run jobs. SL commented that his tests would start to run at RAL again as a result of this. RJ advised that tape access could be an issue. AS noted that they could not gain tape access through the normal tools but there would be a Nagios check. For LHCb the user jobs were not a problem. AS thought the model should not be too much of a problem. RJ noted that the load on the software server could be a concern, but we should keep this separate from production, therefore it was less of a risk. DB asked what the likelihood was that in eight weeks' time they would need more job slots? RJ noted yes, this was likely - they had started at 100 in the US and had increased it, so he agreed that more might be needed. DB advised that if we proceeded with this we would need to be clear that this was a specific solution to a PD2P problem, not a change in Policy - this was not an automatic increase to the number of slots in order to solve a backlog, unless to PD2P, and overall it was to the benefit of the Tier-2. RJ noted that this did not open up the Tier-1 for analysis - the jobs were only for PD2P. AS advised that the other issue was not allowing access to data that needed to be maintained. AS also noted that this usage would need to be tracked. DB agreed, noting that it would be good to keep a watching brief on the situation. The proposal was agreed. AS asked that the timing of this waited until next Tuesday as they were due to do the CASTOR upgrade. This was agreed. RJ would feed this back to the Operations Meeting today. TD asked if this ATLAS adaptive data placement at RAL was temporary? RJ noted that techniques might evolve to do this in a factored way but they didn't exist yet. It allowed a small number of slots to be available rather than allocating a 'Super User' status. TD advised that communication with users would be an issue - we needed to broadcast this via ATLAS and GridPP channels. RJ noted yes, he would do this. ACTION 403.2 RJ to broadcast the move to ATLAS adaptive data placement at RAL, specifically for PD2P only, via ATLAS and GridPP standard channels. 3. Quarterly Reporting Status ============================== DB asked what was the status of the Q3 reporting? SP advised that she had received some reports, some were due at the end of October. The ATLAS and CMS reports had been due last week. People were working on these, but SP reminded that the reports were urgent and due in asap. 4. Minor Items =============== - F2F meeting: DB suggested that the PMB hold a F2F meeting before Christmas. The Oversight Committee meeting was on 10th December at QMUL, and by then we may know about the CSR impact. DB advised that we needed to think about the GridPP4 detail with respect to deliverables and reporting. It was agreed to pencil-in a F2F meeting for 9th December at QMUL. SL to book a room from 11am until 5pm. ACTION 403.3 SL to book a room at QMUL for the PMB F2F meeting from 11am to 5pm on 9th December, prior to the OC on 10th. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) FY09 procurements: - SL09 tranche continues in acceptance test - expected to complete 5th November. 2) FY10 procurements: - Disk tender - orders placed. Delivery late November. - CPU tender - orders placed - Various small system purchases being made   Service: 1) Summary of operational issues is at:     https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2010-10-20 2) CASTOR The LHCB CASTOR instance is generally working well and has sustained rates of up to 1000MB/s (about previous peak). Problems with file status info not being updated have been resolved (two seperate problems, one was workload on the stager database server, the second was an error in the upgrade where multiple stagers were started but not authorised). A change will be scheduled to move all the disk servers to 64 bit in order to fix the checksum problem reported last week. The gen instance upgrade is proceeding and is currently on schedule. 3) ATLAS adaptive data placement at RAL ATLAS intend to commence limited user analysis work at RAL in order to support the data placement service for the UK cloud. Although primarily an ATLAS decision a change request was submitted and reviewed by the change team. Key issues identified were: a) They plan to use the CERNVMFS service which is still a development service. b) ATLAS have noted that there are insufficient controls in CASTOR to prevent user jobs accidentally deleting data if the standard ATLAS tools are bypassed. We plan to move the proxy servers that support CERNVMFS to production this week (but CERNVMFS itself remains test). We have flagged the deletion problem as an urgent issue which cannot be fixed before user work starts and therefore remains a residual risk. It's important to highlight that the RAL ATLAS user service is experimental and we will have to feel our way in carefully as we gain experience. SI-2 ATLAS weekly review & plans --------------------------------- RJ reported that apart from the data placment issue, there was not much more to report. Reprocessing was going through, they were aware of changes coming, but there were no other issues. SI-3 CMS weekly review & plans ------------------------------- DC was absent. SI-4 LHCb weekly review & plans -------------------------------- GP reported as follows: LHCb status: Reasonably smooth week for UK/RAL. 1)RAL T1 operating with limits of 3 job starts/minute and 800 simultaneous batch jobs. 2)Disk server (gdss463) taken out of service for a day - backplane replacement on 19 October. 3)Upload problems continue at Brunel. DB asked if this was a long way from the LHCb spec? GP advised that limits would be increased this week for the job starts, and they were trying to throttle the number of jobs, and would see how it goes. DB asked if there was still a perception that the UK remained a problem? GP advised probably yes, but things were progressing now and we were not blacklisted. AS advised that the workload was variable but there was no cause for concern at the moment - all looked ok. DB asked if GP considered there was further public relations work to be done? GP thought things were really ok, everything had already been covered, and the other Tier-1s had different problems. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) Another RHEL5 vulnerability has been identified (this affects derivates like SL5/SLC5/CentOS5). It was patched for RHEL5 on Friday (22nd October https://rhn.redhat.com/errata/RHSA-2010-0787.html) and sites are in the process of rolling it out. The vulnerability allows a user to escalate their privileges. 2) At the last LHCOPN meeting, the LHCOPN community was mandated to design a solution to improve network connectivity for the LHC Tier2s. Anyone interested in actively participating in the discussions can now join the discussion list via https://e-groups.cern.ch/e- groups/Egroup.do?egroupId=218645. RM couldn't attend the last meeting and had sent round a report. DB noted that we were 17 sites now rather than 4 x Tier-2s - he had flagged this to RM that they should discuss sites rather than Tiers. JC noted he would join the list. 3) In the deployment team we are currently trying to match storage pledge figures in gstat with those agreed in GridPP for 2010/2011. A useful reference page is http://bourricot.cern.ch/dq2/accounting/federation_reports/UKSITES/. The pledge figures shown in gstat appear not to be close to those in the 2nd tranche allocations spreadsheet circulated (i.e. the agreed GridPP pledges). What figures were sent to the WLCG project office? 4) A new GOCDB4 interface came into production during the week of Monday 11th October. There were some initial problems the most significant being that sites could not log new downtimes for the entire site. This issue was quickly resolved and the service appears to be running smoothly. SI-6 LCG Management Board Report --------------------------------- DB reported that the next MB was tomorrow but there was no Agenda set. For the one which took place two weeks ago, there had been nothing relevant in the Ops Report; there had been a report from the CRRB; there had been an EMI discussion which JG may have attended. AS commented that the Ops Reports were always issued either very late or were not available at all, and he advised that they also get circulated differently each time. These things meant that giving feedback was difficult. DB noted he could raise this with Jamie Speirs. AS thought that an earlier report would be helpful as, if it was too late, he couldn't give any useful comment. SI-7 Dissemination Report -------------------------- SP reported that Neasan O'Neill had attended CHEP 2010. The Stand had been quiet, but there weren't many stands there overall, and the location of the GridPP stand had not been ideal. NO would provide a report and conclusions in due course. There was also a news item on NorduGrid coming. AOB === TD advised that the EPSRC call was about to close. Akram Kham had asked to refer to the GridPP section published on Cloud Computing, within his application - was this ok? It related to a six- month pilot funded by JISC and EPSRC. DB noted that in the GridPP4 proposal there had been a section on Cloud Computing. It as agreed that AK could reference this if he wished. TD would let him know. REVIEW OF ACTIONS ================= 384.6 TD/JC to take the lead on the 'GridPP to NGI' document that addresses the forward-moving technical and other issues from a GridPP perspective. JC was gathering info. It was noted that the recipient was likely to be Dave Wallom. Deadline of late November for discussion. Ongoing. 397.1 AS to provide a high-level summary of the Disaster and Business Continuity Plan - by November 15th latest - and also provide a web link to further more detailed documents. Ongoing. 398.6 DC to provide updated LondonGrid MoU. DC reported that the meeting had happened, the LondonGrid MoU had been discussed, DC would incorporate comments. Ongoing. 398.7 DK to check that all is up-to-date in terms of GridPP Security Policies - email DB. If there are any issues, DK to let DB know. DK reported that the GridPP Security Policy phase was ongoing at present, however other policies had been approved by LCG. DK advised that EGI formal signoff was awaited, then the GridPP pages would be updated. Ongoing. 398.9 RJ to provide an updated NorthGrid MoU (only requires to be modified in relation to EGEE/EGI). Meeting will take place 3rd week in October, it will be done then. Ongoing. 398.10 RJ/Graeme Stewart to provide urls of the place(s) where info is located re ATLAS site tests and measurements (so that sites understand what they're being measured on). Ongoing. 398.12 TD/DB to make renewed efforts to engage someone at Glasgow to tackle GridMon and to have access transferred in order to ensure the instances were up-to-date and running ok - DB would insist on a meeting with Mark Leese for a handover. To be done by the end of GridPP3. Ongoing. 398.13 DB to consider how to evolve the User Board into a useful meeting in the future, DB to initiate in the timeframe between now and GridPP4. Ongoing. 400.2 JC to confirm that priorities have been documented for the major experiments for recovering files from disk servers. Ongoing. 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. 401.4 JG to progress issue of end-to-end network problems and the requirement for someone neutral and part of central management, who had a good overview and who could solve problems from a 'middle' position - JG to progress this at GDB. Ongoing. 402.1 Action on the PMB re ticket workflow in the UK in relation to NGS/NGI: tickets were ending in dead ends. This action should be moved to JC/JG. Ongoing. 402.2 JC/JG to provide status report on EGI/NGI Service Level Agreements in the context of GridPP agreeing with the level of service provided, ensuring that it is as GridPP requires. Ongoing. ACTIONS AS AT 25.10.10 ====================== 384.6 TD/JC to take the lead on the 'GridPP to NGI' document that addresses the forward-moving technical and other issues from a GridPP perspective. JC was gathering info. It was noted that the recipient was likely to be Dave Wallom. Deadline of late November for discussion. This should be on the F2F Agenda for 9th December meeting. 397.1 AS to provide a high-level summary of the Disaster and Business Continuity Plan for input to the next OC meeting - by November 15th latest - and also provide a web link to further more detailed documents. 398.6 DC to provide updated LondonGrid MoU. DC reported that the meeting had happened, the LondonGrid MoU had been discussed, DC would incorporate comments. 398.7 DK to check that all is up-to-date in terms of GridPP Security Policies - email DB. If there are any issues, DK to let DB know. DK reported that the GridPP Security Policy phase was ongoing at present, however other policies had been approved by LCG. DK advised that EGI formal signoff was awaited, then the GridPP pages would be updated. 398.9 RJ to provide an updated NorthGrid MoU (only requires to be modified in relation to EGEE/EGI). Meeting will take place 3rd week in October, it will be done then. 398.10 RJ/Graeme Stewart to provide urls of the place(s) where info is located re ATLAS site tests and measurements (so that sites understand what they're being measured on). 398.12 TD/DB to make renewed efforts to engage someone at Glasgow to tackle GridMon and to have access transferred in order to ensure the instances were up-to-date and running ok - DB would insist on a meeting with Mark Leese for a handover. To be done by the end of GridPP3. 398.13 DB to consider how to evolve the User Board into a useful meeting in the future, DB to initiate in the timeframe between now and GridPP4. This should be on the F2F Agenda for 9th December meeting. 400.2 JC to confirm that priorities have been documented for the major experiments for recovering files from disk servers. 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. 401.4 JG to progress issue of end-to-end network problems and the requirement for someone neutral and part of central management, who had a good overview and who could solve problems from a 'middle' position - JG to progress this at GDB. 402.1 JC/JG to address the issue of ticket workflow in the UK in relation to NGS/NGI, to clarify that the support process is: tickets were ending in dead ends. 402.2 JC/JG to provide status report on EGI/NGI Service Level Agreements in the context of GridPP agreeing with the level of service provided, ensuring that it is as GridPP requires. 403.1 SL to contact Sussex and enquire about the possiblity of them hosting GridPP26 in April 2011. 403.2 RJ to broadcast the move to ATLAS adaptive data placement at RAL, specifically for PD2P only, via ATLAS and GridPP standard channels. 403.3 SL to book a room at QMUL for the PMB F2F meeting from 11am to 5pm on 9th December, prior to the OC on 10th. The next PMB will take place on Monday 1st November at 12:55 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager