JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  September 2007

UKHEPGRID September 2007

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 274th GridPP PMB meeting

From:

Tony Doyle <[log in to unmask]>

Reply-To:

Tony Doyle <[log in to unmask]>

Date:

Fri, 28 Sep 2007 17:40:20 +0100

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (20 lines) , 070926.txt (1 lines)

Dear All,

     Please find attached the latest weekly GridPP Project Management 
Board Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


GridPP PMB Minutes 274 - 26th September 2007 ============================================ Present: Tony Doyle, Sarah Pearce, Roger Jones, Stephen Burke, David Britton, David Kelsey, Dave Newbold, Steve Lloyd, Robin Middleton, John Gordon, Jeremy Coles, Peter Clarke, Andrew Sansum, Suzanne Scott (Minutes) Apologies: Tony Cass, Glenn Patrick, Neil Geddes 1. Preparations for the OC =========================== TD asked for summaries of document progress: #115 Executive Summary [PMB] (inc. summary of available performance metrics): this would be dealt with next Monday. #116 ProjectMap Report [DB] (up to 07Q2): DB noted that he needed the Quarterly Reports, and had received nothing on WMS from Dave Colling. The CMS request was currently with DN who would deal with it by Friday. #117 Resource Report [DB] (transition): DB reported that he needs to cast this into the usual format, and was currently working on the Credibility Gap document. RM had provided info on the travel side. #118 LCG Status Report [TC] #119 EGEE Report [RM/JG]: RM reported that he was a third of the way through, the material was in place, he has Operations information and is writing about EGEE III, also including a paragraph on EGI. #120 Deployment Report [DK]: ongoing. #121 MSN Report [RM]: RM reported that he was assembling the material, had the deliverables for GridPP2+, and asked what the MSN report should contain? TD advised that in all cases, until the 07Q3 reports had been received, no conclusions could be reached and transition statements should be based on current info. In 07Q2 the reports were updated with transition milestones. TD advised that RM report on the transition period, using the 07Q2 reports to say what has been done. RM asked if he should do a forward-look to GridPP3? TD advised not, this will come with the 07Q3 report showing activities which had ceased and how things would be taken forward - the GridPP3 Project Map was needed for that. #122 Applications Report [RJ] ~ final report (but only up to 07Q2): RJ noted problems with PhenoGrid and CMS - he can write something on CMS but it will lack detail. DB noted that nothing could be said at this stage about the deliverables on BaBar. #123 User Board Report [GP]: GP had circulated a version and had asked for additional input from ATLAS and CMS. DN noted he could sign it off at this stage. RJ will look at the document and see if he can do the same. #124 Tier 1/A Report [AS]: AS reported that he was working on the CASTOR information and should finish this by Friday. No inputs were required. TD asked if he could include a paragraph on the planned review. AS will refer to the Tier-1 questionnaire based on the Tier-2 one. #125 Tier 2 Report [SL]: SL reported that he had uploaded the new version incorporating input. TD advised that the new MoU cannot be referred to yet - this will have to wait until Monday. Version 1.1 will be the latest. SL will link to it when it is available. TD asked whether this document could now be signed-off? It will be reviewed again by TD and DB. #126 Dissemination Report [SP]: SP reported that no input was required, she had completed headings and the report was ongoing. ----requested---- #127 GridPP3 Plan [DB]: It was asked whether this was v5? Yes - DB reported that he was waiting until all grants were issued, as finalised figures would be different to the planning figures - he would need to work out how to cover intermediate increases etc. TD noted that the Glasgow grant had not yet arrived. It was agreed that it was unlikely that DB would receive the information he needed, over the next few days. DB reported that only two grant issues were outstanding: QMUL and Oxford - these were in abeyance and the totals were not known. There was a small degree of uncertainty with the plan and a 'final' statement was unlikely on the OC timescale. #128 Credibility Gap [DB]: It was noted that v0.3 had been circulated and the document was ongoing work-in-progress. DN and RJ may have information to add but it was already in reasonable shape. TD noted the gap between June and now relating to interactions with experiment OCs. DB hoped to receive information relating to June onwards from the experiments. It was asked whether the OC meeting was from 10am to 4pm? TD confirmed that the format of the meeting was not yet known. TD reported that he had contacted Trish Mullins, who will define the Agenda. TD would revert to her on Friday as she was currently out of the office. #129 Disaster/Scenario Planning (inc. OPN network example) [TD]: It was noted that JC holds the token for this document with input from SB. JC reported that the structure was there but areas needed filled-out - it was 70% ready but it still needed work. There was also the question of how it tied-in with other things - should he keep security and network separate? PC noted that regarding networking, these should be kept in the same document at the end, but regarding UKERNA, security and network should be treated separately. It was agreed that JC would hold the token until Thursday, and PC would provide him with a networking summary dealing with UKERNA and existing links [done following the meeting]. JC asked if he should include an Appendix about the link resilience information from Robin Tasker? This was agreed. TD had circulated a scenario, and this required to be reviewed by PC. It was noted that the OC require disaster scenarios in relation to the experiments. This requires input from RJ and DN on disasters and critical issues to the experiments - the document would be considered incomplete if these were not included. It was noted that there should be shared responsibility but GridPP need to take ownership of this document and provide a listing. It was noted that the main issues were likely to be data management and movement. A couple of entries per 'event' was required to show how things will be handled. DN agreed that he would prepare a document this week. RJ noted that a 'gap analysis' document had covered this issue over a year ago. JC will circulate version 0.4 to let RJ and DN see the document. DN will write something defining the fall-back position, defining disaster and describing experiment considerations thus far. 2. Contacts List ================= It was noted that this had been updated, however experiment contacts were not there, and DB intended to add extra tags as follows: 1) Experimental Contacts 2) Site Security Contacts 3) Site sysman Contacts It was understood that this was an open-ended issue, and the listing would never be in a 'finished' state. It was agreed that whatever information had been added by Friday, the list would be available via the web in any case. There was a discussion about security contacts and the GOC database. It was agreed that DB would update the contacts list with the tags added and circulate to the PMB. Following this, any further changes to be advised to SS. 3. AOCB ======== SL asked whether the Edinburgh feedback for ScotGrid was now available? JG was going to check whether the feedback was acceptable and an update from PC had been awaited. It was agreed that 4-5 people had been involved in this process. The update from PC was modified and uploaded by JG. The issue was considered to be completed. STANDING ITEMS ============== SI-1 Dissemination Officer's Report ------------------------------------ SP reported that GridTalk had been submitted to the EU. An email had been circulated recalling adaptors. EGEE07 was next week and Neasan O'Neill would be attending. TShirts were still available as free gifts. NO was working with QMUL regarding the bid to IoP for the LHC event. SI-2 Tier-1 Manager's Report ----------------------------- AS provided the following report: Hardware: 1) 10Gb path from Tier-1 to SJ5 - Network group were working on this on a test network and although they were unable to meet the target deadline of Tuesday 18th September they expected to have this in place this week for testing. This matter is being escalated within the organisation. 2) A public AS number has been obtained in order to allow routing of Tier-1 traffic over the OPN. This was agreed with CERN to be implemented on Thursday 20th, but problems were encountered during installation and a new date needs to be fixed.. See above 3) Tenders: a) Disk tender - closing date is 3rd October. b) CPU tender. Draft tender document circulated, expect to issue the tender on Wednesday. c) Tape media Framework - evaluation nearly complete (expected to be done this week) - 10 day standstill yet to commence d) Tape drive servers (matches previously purchased tape drives). Scheduled for delivery by the end of September, but no ETA yet from supplier. e) Tape drive purchase. We are now in position to commence purchase of the remaining six tape drives that are planned. Service: 1) SAM availability for last week was 100% (Steve Lloyd monitoring page) 2) Tier-1 admin on duty role has started (last week). The current process is documented at: http://www.gridpp.ac.uk/wiki/RAL_Tier1_Admin_On_Duty A wiki will be available soon, with the daily log. This is taking a significant amount of effort (0.3-0.5 FTE but hopefully is reducing informal effort elsewhere and should help improve availability. 3) CSA07 has started, but we have yet to see data transfers at RAL. 4) CASTOR - CASTOR is operating reasonably well. Main outstanding issue was a problem of excessive file replication during multiple reads. CASTOR 2.1.4 is released by CERN and we have agreed a provisisional timetable for migration. Repack is working, but is difficult to use and labour intensive we have raised this issue with CERN. Full CASTOR issues list is at: http://www.gridpp.ac.uk/wiki/RAL_Tier1_CASTOR_Experiments_Technical_Issues 5) SL4 Migration - The SL4 instance currently hosts 72% of farm capacity and we plan to reach 90% by the end of the month. Use of SL4 has been relativly low (25%), we are investigating the reasons for this and will also begin encouraging user communities to move to SL4 ASAP. 6) VO box SLAs - An SLA has been agreed with LHCB - RAL no longer operates an ATLAS VO box - A draft SLA is with CMS - We are waiting for feedback on the Alice SLA but will shortly inform them that we consider it to be in place. 7) LHCB LFC - Work on the LFC is stalled following the identification of a configuration problem with the current recommended plan. 3D and LHCB are working to resolve this issue. 8) RGMA Migration - Discussions are underway with the RGMA team as to how best to carry out the migration of the RGMA service to new hardware. Progress to Grid Only Access: This standing item documents the status of work towards achieving GRIDPP milestone 0.18 "Access to Tier-1 resources by Grid Interfaces Only". The draft news item generated some discussion and the UB has now submitted a modified proposal to the Tier-1 board. SI-3 Production Manager's Report --------------------------------- JC provided the following report: 1) Last week there was a suspected compromise of the UK Root CA. A report was issued suggesting that there was no need to rebuild the UK eScience PKI or re-issue end user's certificates, but it will require redistribution of new keys of the UK root CA via the IGTF and TACAR repositories. A concern raised in the deployment team was that UKI ROC/GridPP security contacts found out about this indirectly. It was thought that this was the third time a UK matter of relevance had by-passed UK/GridPP security contacts. The reporting lines via GridPP were now documented and will be used in future. 2) ATLAS data distribution to UK T2s has progressed over the last few weeks. There has been an FTS proxy problem at the T1 which created some backlogs. 3) CMS has appointed two T2 representatives. CMS is planning T1 visits with an associated T2 meeting from October. There was an expectation that the DPM issue would be fixed last week and that would allow more of the UK sites to take part in CSA07. In a CMS meeting at CHEP there was a hint that CMS job requirements (on 32bit) would move above the 1GB limit. 4) LHCb appear to be running on SL(C)4. Simulation finished and they moved on to reconstruction and analysis which has seen many stalled jobs at RAL. 5) Within the WLCG planning there is now (post-CHEP) a commitment to full chain testing for all the experiments (at the same time) in February and May 2008. 6) A CE for SL4 is expected "soon". This will be based on the LCG-CE - the port to SL4 was apparently easier than originally expected. 7) There is an LHCb software week 1st-5th October which Greig will be attending. This is also the week of the EGEE'07 conference (http://indico.cern.ch/conferenceTimeTable.py?confId=18714&showDate=all& showSession=all&detailLevel=contribution&viewMode=parallel) which a number of the deployment team will be attending. 8) Provisional dates for the SRM2.2 workshop(mentioned by SB last week) are 13th and 14th November. This is aimed at getting sysadmins more familiar with SRM2.2 which is expected to be rolled out at T2s by the end of January 2008 - though there are still open questions about what the experiments want at T2s. There is a HEPSYSMAN meeting (with monitoring tutorial/workshop - still being arranged) scheduled for 31st October in London. 9) Utilisation of CPU has been low for some weeks. It is currently at the 20% level with biomed, pheno and ATLAS jobs. 10) 7 GridPP sites have now upgraded to SL4 WNs. You will recall that several sites intended to upgrade when new purchases arrived - Oxford for example which has recently had its new computer room go live. SI-4 LCG Management Board Report --------------------------------- TD reported that the meeting on 18/9 had involved preparations for the LHC computing committee review which was due to take place on Nov 20/21. This was a two-day review and will be the last of its type. Iterations on format etc were taking place. There could be a US and UK Tier-2 review as part of the input; JC may be invited. A review of the experiments was not required this year. There may also be a generic storage talk, the remainder relating to site reliability, a discussion of monitoring from GridView, and there were issues outstanding relating to SE reporting. There would also be a review of wider issues including experiment-specific monitoring. SI-5 Documentation Officer's Report ------------------------------------ SB provided the following report: There was a UIG meeting last week, albeit thinly attended (two other people). The main question, which has a broader impact, was how to deal with the switch from the LCG RB to the glite WMS, especially since the network server was now deprecated so it wasn't just a matter of changing the name of the commands. For now the information about the glite-job-* commands (network server) will be removed; in the medium term we will need to move to documentation which doesn't mention the old edg- commands, but it isn't clear what the best way to approach that will be. The same considerations will apply to the gridpp web pages and the user guide. To some extent we will have similar issues with the introduction of srm 2. The UIG also agreed to try to find people to expand the range of documentation, which stalled over the summer - SB agreed to write about information systems, and possibly VO software installation. REVIEW OF ACTIONS ================= 250.4 RJ, DN, GP, TD to meet to integrate experiment requirements of Tier-2s going to Tier-1 - sites are aware of requirements but discussion still has to take place. It was noted that this issue is not high priority. A meeting is to take place with Barney Garrett. 252.3 RM has now received inputs for his one-page summary regarding the transition of each of the existing Middleware areas from GridPP2 to GridPP2+ to GridPP3 - this to go to DB. This was to be done by Friday GridPP2+ 8th June but is still ongoing. This is now urgent. It was noted that this information would be in the MSN report. TD noted that it also needs to be in plan v5 as a statement. 261.4 DB to look through the input in detail in relation to GGUS problems. DB currently working on grants issues and quarterly reporting - this would be dealt with as soon as possible. 263.2 JG to further investigate the lack of ability to pass job requirements to the batch system and report-back (Tier-2 review issue). JG will raise this through the GDB. 267.3 SP to begin organising metrics for GridPP3, beginning with update and review of existing milestones and metrics, plus review of WLCG requirements. SP to co-ordinate with DB, AS and JC. It was agreed that the high-level view should be prepared for the OC relating to what has been agreed, and how we are working towards this - SP to present a few slides. 268.1 RJ to prepare a one-page table for ATLAS (regarding Tier-3 resources) that could be used as a template for all the Experiments. Following this, action on GP, RJ, and DN to come up with a short proposal. It was noted that RJ had drafted something but this was not yet completed. 271.2 Re CERN-RAL OPN link breakage, RJ to provide an analysis of what the consequences would be to Experiments for a one-day break, a three-day break, a five-day break, etc. The outcome of these need to be assessed for disaster scenario planning. 272.1 SL to prepare a Tier-2 Report for the OC. Done, item closed. 272.3 PMB to email TD notes/suggestions in relation to disaster scenario planning. Done, item closed. 272.4 AS to check the current Tier-1 disaster recovery plan and circulate the existing version to the PMB. It was reported that this document does not exist, but it was planned to have one in the longer term. TD would incorporate in v0.4 anything that AS considered relevant. AS will check and advise additions. 272.5 DB to prepare a 'credibility gap' document. Done, and circulated. 272.6 SL & TD to discuss the MoU during this week and provide a draft version by next week to go to the Tier-1 and Tier-2, prior to submission to OC. Done, item closed. 272.8 TD to supply a letter of support for GridTalk, on behalf of GridPP. This to go to SP. [Done following the meeting]. 273.1 JG to send a suitable phrase regarding WLCG to TD for inclusion in the MoU. [Done following the meeting]. 273.2 JG to send suitable wording to TD regarding Monitoring of Hardware Resources at sites (MoU). Done, item closed. 273.3 NM and TD to confer regarding the wording of Hardware Support Staff (MoU). Ongoing. 273.4 NM to assist with wording of Appendices (MoU Annexes). This will be done outwith the PMB. Done, item closed. 273.5 JG to provide updates regarding Operations (MoU). Done, item closed. 273.6 TD to circulate finished version of MoU mid-week prior to the OC. 273.7 Action from Ambleside: regarding site availability, SL to plot his data and JC to highlight it via wiki. Ongoing. 273.8 DK to clarify the status of Security Policy documents. Done, item closed. 273.9 TD to draft letter to Cambridge regarding Condor deployment problems and proposed resolutions. Ongoing. ACTIONS AS AT 26.09.07 ====================== 250.4 RJ, DN, GP, TD to meet to integrate experiment requirements of Tier-2s going to Tier-1 - sites are aware of requirements but discussion still has to take place. It was noted that this issue is not high priority. A meeting is to take place with Barney Garrett. 252.3 RM has now received inputs for his one-page summary regarding the transition of each of the existing Middleware areas from GridPP2 to GridPP2+ to GridPP3 - this to go to DB. This was to be done by Friday GridPP2+ 8th June but is still ongoing. This is now urgent. 261.4 DB to look through the input in detail in relation to GGUS problems. DB currently working on grants issues and quarterly reporting - this would be dealt with as soon as possible. 263.2 JG to further investigate the lack of ability to pass job requirements to the batch system and report-back (Tier-2 review issue). JG will raise this through the GDB. 267.3 SP to begin organising metrics for GridPP3, beginning with update and review of existing milestones and metrics, plus review of WLCG requirements. SP to co-ordinate with DB, AS and JC. It was agreed that the high-level view should be prepared for the OC relating to what has been agreed, and how we are working towards this - SP to present a few slides. 268.1 RJ to prepare a one-page table for ATLAS (regarding Tier-3 resources) that could be used as a template for all the Experiments. Following this, action on GP, RJ, and DN to come up with a short proposal. It was noted that RJ had drafted something but this was not yet completed. 271.2 Re CERN-RAL OPN link breakage, RJ to provide an analysis of what the consequences would be to Experiments for a one-day break, a three-day break, a five-day break, etc. The outcome of these need to be assessed for disaster scenario planning. 272.4 AS to check the current Tier-1 disaster recovery plan and circulate the existing version to the PMB. It was reported that this document does not exist, but it was planned to have one in the longer term. TD would incorporate in v0.4 anything that AS considered relevant. AS will check and advise additions. 273.3 NM and TD to confer regarding the wording of Hardware Support Staff (MoU). 273.6 TD to circulate finished version of MoU mid-week prior to the OC. 273.7 Action from Ambleside: regarding site availability, SL to plot his data and JC to highlight it via wiki. 273.9 TD to draft letter to Cambridge regarding Condor deployment problems and proposed resolutions. 274.1 DB to update the contacts list with the tags added and circulate to the PMB. 274.2 DK to update and upload Security Documents to a suitable site, to be cross-referenced. The sites would be the Deployment Board/Tier-1/Tier-2 Board. INACTIVE CATEGORY ================= 247.2 RJ to get further information from ATLAS regarding use of Grid for testing of PANDA, and report-back. 251.1 TD to raise the issue of memory vs CPU cost at the MB [in order to work out what the requirement was between 1GB and 2GB memory per core]. 253.1 AS has commenced work on the report on data integrity at Tier-1, in relation to implementation of checksums. Ongoing, AS hopes to complete this by end August. 261.5 JC and dTeam to carry out a survey on sites' experiences of GGUS, when possible to organise. This was pending but would be addressed after the holiday period. It was noted that a Questionnaire was required. 271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN link, in one year's time, when actual data on breakages is available. Due date would be September '08. 271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the issue and collate info so that the PMB have something to revisit in one year's time. Due date September '08. The PMB would meet next Monday 1st October as usual. It was noted that some would be attending EGEE07 in Budapest. TD reminded the meeting that all document versions were required by Friday (28th) for review, followed by preparation of the Executive Summary on Monday.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager