JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  January 2012

UKHEPGRID January 2012

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 447th and 448th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Mon, 23 Jan 2012 12:31:39 +0000

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (33 lines) , 111219.txt (300 lines) , 120109.txt (493 lines)

Dear All,


Please find attached the GridPP Project Management Board
Meeting minutes for the 447th and 448th meetings.

       The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.



















GridPP PMB Minutes 447 (19.12.11) ================================= Present: Dave Britton (Chair), John Gordon, Jeremy Coles, Andrew Sansum, Dave Colling, Dave Kelsey, Tony Doyle, Tony Cass, Glenn Patrick, Roger Jones (Suzanne Scott - Minutes) Apologies: Pete Gronbech, Steve Lloyd, Robin Middleton, Pete Clarke, Neil Geddes 1. DRI Status ============== DB reported that all was ok, there seemed to be a last minute hitch but it was resolved. Things were on track as far as he knew. AS advised that they had checked with Trish Mullins regarding flexibility - there wasn't any so they would probably be placing orders this week. There was a price drop due after Christmas. The outstanding issue they had was the £40k resource cost - this would be for maintenance on switches and they needed to resolve it. They may be able to profile it across 3 years. DB advised that the end date of all University grants was 31st March 2012. Tony Medland had confirmed that we can spend up to that point but not after. This would be followed by the usual 3-month Final Claim period. Final Claims would have to be submitted by end of June 2012. 2. RAL Network Issues ====================== Some time ago, DB had raised the issue of network outages at the Tier-1. AS confirmed that he had just circulated a document regarding this. AS advised that Gareth and he had discussed network issues and gone through all of them over the past 12-month period. They did this at the end of October. Over the year there had been a couple of major issues at sites, also scheduled interventions. There had been around 2% of lost time over the period. At the Tier-1 the issue had been the firewall router and the access router. The main challenge was that site network was not resilient - they needed to take the site down for interventions. In future, availability should be better. JG noted that they didn't have an underlying serious hardware problem - human error had been the main cause. There had been a few recent incidents but the stats over the year showed a small effect only, at 2%. DB asked regarding the investment of DRI money - which areas would this address? AS advised that operationally, the DRI money would allow resilience for doing interventions. DB thought that next year we should perform a Tier-1 review. 3. RAL Database Issues ======================= DB asked about the SRM database? AS noted that the situation followed-on from the Thursday network breaks. The problem was they couldn't unstick the ATLAS SRM database, the Resource Manager was not releasing resources. The team tried to resolve this but got repetition of the difficulty. Post Mortem detail was awaited. DB advised that we needed to understand whether anything else was vulnerable. If things start going wrong, what would they do? AS noted they would move over the databases, and they were prepped and ready to go as it had already been discussed. More worrying was the fact that the 10G maintenance had expired - this came to light last week. They would hold a meeting soon to discuss the strategy of 11G upgrade. TC advised that the licences came from CERN and the maintenances should be paid for ok, extended maintenance was paid. AS would pass this info onto Rich to get in touch with TC if necessary. 4. AOCB ======== DB noted that he had circulated the Minutes from NG re the NGS/GridPP meeting. Could the PMB get back to him if they had any comments/questions. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported that the Viglen deliveries had a delivery date of w/c 24/01/2012 - this was for Viglen disk and CPU. Clustervision had all of the parts and would probably deliver earlier than Viglen (Clustervision disk). For DELL there was no date as yet. They had received all of the tape media and the tape drives were in small orders and were being received. DELL may be running into problems with disk. Regarding staff, there were start dates for two fabric Sysadmins, and there had been a new start on the CASTOR Team recently. They should have four starts in total, and the forecast timeline was ok. SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) Almost all sites now have CVMFS installed. 2) The latest version of CREAM was planned for release in UMD 1.4.0 (due out today) as SGE sites need it. But a serious bug was found. Sites are looking at the EMI version but this is unfortunate given the deadline to remove LCG-CEs. Sites upgrading at this stage before the holiday period is anyway unwise. 3) The UK CA DN update has caused problems for VOMRS renewals for several UK users (affecting all the LHC VOs, dteam and others) and this has become increasingly evident. An operation on the backend database at CERN has attempted to workaround the problem of manual interventions being needed. This was done for CMS on Wendesday last week and will be undertaken for all other VOs too if no unexpected issues arise. The intervention is expected to be transparent for users. We have not directly informed users to avoid confusion, but a technical explanation has been posted by Jens for those who do query what has happened: http://nationalgridservice.blogspot.com/2011/12/ca-stuff.html. Once the workaround is established as successful and other VO membership entries are updated we will send an update to GridPP-Users explaining what has happened. DB asked why this had happened. DK noted it was a new CA, not a VOMS issue. JC advised that the workaround seemed to be working. He would put a note round the GridPP user list. For information: A) There is a DPM community event being proposed for February/March. The storage group (and ops team) are discussing whether we might consider hosting a DPM workshop next year. SI-3 ATLAS weekly review & plans --------------------------------- RJ noted they had been disrupted by networking issues at RAL; they would be working flat-out over Christmas and there was a call for extra capacity. Apart from that, there were no issues to report. SI-4 CMS weekly review & plans ------------------------------- DC was absent. SI-5 LHCb weekly review & plans -------------------------------- GP advised that MC11 production was now ongoing. There were no issues to report. SI-6 User Co-ordination Issues ------------------------------- None to report. SI-7 LCG Management Board Report --------------------------------- There had been no MB. SI-8 Dissemination Report -------------------------- SL was absent. REVIEW OF ACTIONS ================= 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. Ongoing. 438.2 PC to provide feedback and guidance about the data management plan following the CAP meeting on 4th October 2011. [PC will circulate something to the PMB before submission to STFC - RJ and PC are dealing with this.] Ongoing. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. Ongoing. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. Ongoing. 439.1 AS to put together a summary of network issues recently experienced at the Tier-1. Done, item closed. 446.1 PG to contact each PI individually re the DRI grants to ensure they understood they had to order/commit an equipment spend on their Institutes' systems before 31st March 2012. Ongoing. 446.2 Re the DRI grants: PG to follow-up with PIs regarding evolution of plans and quotes in order to monitor spend progress. Ongoing. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). 446.4 DB to contact Tony Medland and clarify the DRI 'spend by' date. Done, item closed. 446.5 JC to inform Lancaster that they should fund the backup Nagios server from the recent large grants awarded to it. Done, item closed. ACTIONS AS OF 19.12.11 ====================== 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. 438.2 PC to provide feedback and guidance about the data management plan following the CAP meeting on 4th October 2011. [PC will circulate something to the PMB before submission to STFC - RJ and PC are dealing with this.] 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. 446.1 PG to contact each PI individually re the DRI grants to ensure they understood they had to order/commit an equipment spend on their Institutes' systems before 31st March 2012. 446.2 Re the DRI grants: PG to follow-up with PIs regarding evolution of plans and quotes in order to monitor spend progress. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). The next PMB would take place on Monday 9th January 2012 at 12:55 pm.
GridPP PMB Minutes 448 (09.01.2012) =================================== Present: Dave Britton (Chair), John Gordon, Jeremy Coles, Andrew Sansum, Dave Colling, Dave Kelsey, Tony Doyle, Tony Cass, Glenn Patrick, Roger Jones, Pete Gronbech, Steve Lloyd, Robin Middleton, Pete Clarke (Suzanne Scott - Minutes) Apologies: Neil Geddes 1. DRI Status ============== PG reported no change to the release of the grants - he had checked with Malcolm Booy and Trish Mullins, and they had said they were tying to sort it out. The profile of the grant had to match what we needed, and this apparently was not easy to effect on the new system. PG advised that, where Universities agree, PIs could start ordering. This was unlikely however, as Universities could not spend on credit. All DRI bids were now submitted, but were not yet approved. PC advised that STFC could transfer money for the grant onto an account in advance - they could probably do that now - which would allow PIs to begin ordering equipment. DB asked PG to discuss possible options with STFC - could they perhaps advance half of the funding? PC noted that time was critical now and orders might not be done and delivered within timescale - he agreed that we should check with STFC as to whether they could do an advance. DB thought that the issue was they were having trouble with profiling on their new system. PG would check. AS noted that he had received the new DELL pricing for the Force 10 kit, and it was about half the price compared with before Christmas. ACTION 448.1 PG to contact STFC again and discuss any possibilities regarding release of part of the funding, in order to allow procurements to commence at institutes, and also to check current approval status. 2. CHEP Travel Guidelines ========================== DB advised that people had been in contact to note that submissions had largely been accepted as posters. What was the policy for funding posters? RM advised that it was currently 50%, but people if funded needed to stand at the stand and engage with the public - one's name on the poster wasn't enough. DB asked about a group of posters per one person? RM thought that we weren't usually that prescriptive, but the cost would be in the next FY. PG noted that we needed a list of the people who wanted to go. DB asked PG to follow this up and co-ordinate it. PG would check and confirm current status for next week's PMB. ACTION 448.2 PG to check with all those who had submitted a paper to CHEP, who had been awarded a poster instead, and ascertain who actually wanted to go. The PMB would decide once they saw the list of people and sites. RM advised that the wLCG workshop was immediately before CHEP and that we usually funded this at 100%, including subsistence, for that event. The support went down to 50% for CHEP. ACTION 448.3 PG to establish who, out of the list of those wanting to go to CHEP, also intended to go to the wLCG workshop immediately before CHEP. 3. User Co-ordinator Position ============================== DB advised that GP had been in this position for some time now. He would be leaving RAL at the end of May 2012, therefore there was a need to consider the future role of his post. DB considered that it was an opportunity to think about the scope of the role and who might be the best person to take over. DB invited the PMB to think this over and contact him directly with thoughts/suggestions. We also needed to consider how GridPP positions itself beyond GridPP4. The role had also evolved over the years so we needed to look ahead at this point. DB noted that we could bring in new people, but it would need someone with a broad view and also time and interest to take it on. PG asked about RAL involvement - was the role at RAL mandatory/expected? DB noted no, this was not a RAL position, it was a GridPP position and was potentially possible at any institute. DB noted he needed inputs from everyone, ideas were required at this point, and it would be good to discuss this at a F2F meeting. ACTION 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 4. AOCB ======== DB noted that he had re-shuffled the Standing Items due to both constraints of meeting attendance for PMB members and the logical progression of reporting prior to the Tier-2 report. AHM Paper: DC requested a draft asap please, from RJ, GP, AS, JC, (and himself). Alice: Regarding Alice, DC had emailed Lee - was he still working with Alice? AS advised that he had called into the last meeting. STANDING ITEMS ============== SI-1 Dissemination Report -------------------------- SL had circulated an email report from Neasan O'Neill: 1) Website revamp will be done by end of the month/start of Feb. 2) Neasan should be attending the e-ScienceTalk Face2Face, next week. DB asked if Neasan could do a presentation report on this at the Manchester GridPP meeting? 3) Neasan had some news items to chase/check on but should be one up this week (as long as the ENROLLER work got done over the holidays) SI-2 ATLAS weekly review & plans --------------------------------- RJ advised that on the RAL side there had been network interruptions and an SRM problem last week; sites were slow to respond, but three would need to account for it at the next ADC meeting. The sites were: Durham, UCL, and Birmingham was problematic. For the other Tier-2s, they were switching to CVFMS - this was an issue for CMS in relation to the way that the cluster was configured, but they were trying a workaround. RJ noted there had been a lot of jobs processed over Christmas; scheduled downtime was happening soon and they would contact sites to advise. SI-3 CMS weekly review & plans ------------------------------- DC reported a network outage over Christmas; there had been an Oracle issue on 16th December, but all else was ok. The Tier-2s were doing fairly well. Bristol however was at 30%. SI-4 LHCb weekly review & plans -------------------------------- GP noted that things had been steady over Christmas; one disk server at the Tier-1 had been out for one day, then there had been scheduled interventions on Castor. There was steady MC production at present. SI-5 Production Manager's Report --------------------------------- JC reported as follows: The Christmas and New Year periods passed without a major incident. CERN reopened last Thursday 5th January. For global operations ATLAS reported that the grid ran smoothly with occasional Tier-2 problems not significantly impacting global production. There was a VOMS issue at BNL that affected Panada jobs on 3rd January. RAL and some UK T2s were affected by a change in Stratum0/1 configurations at CERN in December that led to an issue with latest software versions not being installed. Next Monday an ATLAS database migration to 11g means there will be no grid activity during 16th and 17th January. CMS did not report any major problems. Overall 60 tickets were submitted (globally) and over half were closed promptly. Analysis levels dipped over the Christmas period but Tier-2 availability remained good. LHCb also reported a good service over the holiday period though some issues with Monte Carlo merging jobs were seen at RAL around 1st January (possibly a failing disk server). New ROD tickets created over the period were set to expire on 4th January. Some hosts at Brunel were down from 27th December due to certificate expiry. Birmingham was affected by missing ATLAS DB release files from 23rd Dec. Lancaster was affected by excessive data transfers by T2K on 23rd December (Liverpool reported seeing an issue too so this needs following up) which consumed a lot of resources. Bristol experienced CE issues and was down from 24th to 28th December. Some sites declared themselves at risk over the period: Brunel (23rd-5th); Glasgow (23rd-5th); RALPP (23rd-3rd) and ECDF (24th-4th). There was a root vulnerability announced over the Christmas period (announced 24th December; EGI advisory sent on 26th December). The service affected was not found running on any CEs (on the expected port) and therefore did not require urgent attention, though there is an ongoing assessment of other potential impacts. For information: A) There is a GDB this week: http://indico.cern.ch/conferenceDisplay.py?confId=155064. Remote participation is to be via Vidyo. B) Tier-2 quarterly reports have been requested. SI-6 Tier-1 Manager's weekly report ------------------------------------ AS reported as follows: FABRIC: 1) FY11 procurements - Disk deliveries expected w/b 12th January (TBC) and 24th January - CPU deliveries expected w/b 16th January and 24th January. - T10KC media all received - Tape drives received 2) A number of incidents on the site network leading up to Christmas (two independent problems) but appear to be resolved and no further issues over the holiday period. 3) Site DNS upgrade went very smoothly (two servers remain to be changed - at our request). We expect the servers more critical to us to be upgraded Tuesday 10th - we do not expect any problems. 4) Repacking of ATLAS data to T10KC has been completed. We expect to keep LHCB and GEN on T10KA as long as possible - possibly right through 2012 depending on demand for the A/B series tapes. 5) Fabric team busy last week moving racks in the machine room in order to accommodate incoming deliveries. 6) We expect a lot of work in the machine room in February/March - hardware installations, electrical and cooling work and cold isle installation. Some increased risk of incidents. SERVICE: 1) Summary of operational for the week leading up to Christmas is at:     https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-12-21 2) Holiday period operations were very smooth. Scheduled routine checks were carried out and on-call team made a number of interventions but generally no problems. Fabric team (Kash) attended on-site once (on the 2th) to resolve a number of hardware problems.     https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2012-01-04 3) CASTOR The CASTOR ORACLE database servers were moved to temporary hardware as part of the planned migration to a new ORACLE configuration. The upgrade started late (after a fallen tree blocked the Wantage road early in the morning) but still just completed within the scheduled downtime. Generally the upgrade went very well, but there have been performance issues on the logging volume (which were not seen under load test). However this configuration is scheduled to be in place for just 3 weeks until phase 2 completes and moves ORACLE on to its final hardware configuration. STAFF: 1) Grid team leader post ongoing. 2) Recruitments underway * Two system admins for Fabric team - both expected to start this month. * One CASTOR admin - started (Rob Appleyard) * One Grid Team member - expected to start in next few weeks. * Keir Hawker's (Database team lead) last week this week (leave of absence). Richard Sinclair will be acting team lead. DB admin post advertised. SI-7 User Co-ordination Issues ------------------------------- Ulrich had spoken to GP about the Ganga Development Day at Birmingham, organised by Mark Slater. He was looking for funding for an Italian Developer to attend and contribute to the day (re SuperB) and had asked for around £120. DB thought this sum was fine as it was minimal, and it would be nice to encourage SuperB. DB would respond to him. ACTION 448.5 DB to respond to Ulrich about the Italian Developer attending the Ganga Development Day at Birmingham - £120 funding had been authorised. AS noted that he had been trying to get a response from D0 re their old file system but had received nothing back. They would be dropping the file system soon if D0 didn't respond - no authoritative response had been received, yet they had been trying to contact D0 for 9 months now. GP advised that in the past his contact would have been Gavin Davies, and he suggested that AS try and contact him. GP could do it, if it would help. DB asked if GP could follow this up? GP noted yes. AS would forward the email thread. ACTION 448.6 GP to try and contact Gavin Davies, on behalf of AS, to try and get a response regarding the imminent drop of the D0 file system. SI-8 LCG Management Board Report --------------------------------- It was noted that there was an MB taking place tomorrow. JG advised that he had been talking with Ian Bird prior to Christmas and that he would be giving up the position of Chair. They would be seeking a new Chair to replace JG in the Spring. Countries would be asked to nominate a Chair. DB commented that presumably there would be a bias against the UK following JG's tenure? JG agreed probably yes. PC asked if anyone were suitable or were there constraints? It couldn't be someone from a site. JG noted that PG/JC would be eligible to apply. JG advised that he would be on the search committee. REVIEW OF ACTIONS ================= 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. Ongoing. 438.2 PC to provide feedback and guidance about the data management plan following the CAP meeting on 4th October 2011. [PC will circulate something to the PMB before submission to STFC - RJ and PC are dealing with this.] PC had finished the CAP document now, which provides info to STFC re data policy. Inputs from RJ were still awaited. It was agreed that this action would be closed and a new action opened in its place: 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. No further info available at present. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. Ongoing. 446.1 PG to contact each PI individually re the DRI grants to ensure they understood they had to order/commit an equipment spend on their Institutes' systems before 31st March 2012. Done, item closed. 446.2 Re the DRI grants: PG to follow-up with PIs regarding evolution of plans and quotes in order to monitor spend progress. Done, item closed. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). In progress, being done. ACTIONS AS OF 09.01.2012 ======================== 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). 448.1 PG to contact STFC again and discuss any possibilities regarding release of part of the funding, in order to allow procurements to commence at institutes, and also to check current approval status. 448.2 PG to check with all those who had submitted a paper to CHEP, who had been awarded a poster instead, and check who actually wanted to go. The PMB would decide once they saw the list of people and sites. 448.3 PG to establish who, out of the list of those wanting to go to CHEP, also intended to go to the wLCG workshop immediately before CHEP. 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 448.5 DB to respond to Ulrich about the Italian Developer attending the Ganga Development Day at Birmingham - £120 funding had been authorised. 448.6 GP to try and contact Gavin Davies, on behalf of AS, to try and get a response regarding the imminent drop of the D0 file system. 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy. The next PMB would take place on Monday 16 January at 12:55 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager