GridPP PMB Minutes 448 (09.01.2012) =================================== Present: Dave Britton (Chair), John Gordon, Jeremy Coles, Andrew Sansum, Dave Colling, Dave Kelsey, Tony Doyle, Tony Cass, Glenn Patrick, Roger Jones, Pete Gronbech, Steve Lloyd, Robin Middleton, Pete Clarke (Suzanne Scott - Minutes) Apologies: Neil Geddes 1. DRI Status ============== PG reported no change to the release of the grants - he had checked with Malcolm Booy and Trish Mullins, and they had said they were tying to sort it out. The profile of the grant had to match what we needed, and this apparently was not easy to effect on the new system. PG advised that, where Universities agree, PIs could start ordering. This was unlikely however, as Universities could not spend on credit. All DRI bids were now submitted, but were not yet approved. PC advised that STFC could transfer money for the grant onto an account in advance - they could probably do that now - which would allow PIs to begin ordering equipment. DB asked PG to discuss possible options with STFC - could they perhaps advance half of the funding? PC noted that time was critical now and orders might not be done and delivered within timescale - he agreed that we should check with STFC as to whether they could do an advance. DB thought that the issue was they were having trouble with profiling on their new system. PG would check. AS noted that he had received the new DELL pricing for the Force 10 kit, and it was about half the price compared with before Christmas. ACTION 448.1 PG to contact STFC again and discuss any possibilities regarding release of part of the funding, in order to allow procurements to commence at institutes, and also to check current approval status. 2. CHEP Travel Guidelines ========================== DB advised that people had been in contact to note that submissions had largely been accepted as posters. What was the policy for funding posters? RM advised that it was currently 50%, but people if funded needed to stand at the stand and engage with the public - one's name on the poster wasn't enough. DB asked about a group of posters per one person? RM thought that we weren't usually that prescriptive, but the cost would be in the next FY. PG noted that we needed a list of the people who wanted to go. DB asked PG to follow this up and co-ordinate it. PG would check and confirm current status for next week's PMB. ACTION 448.2 PG to check with all those who had submitted a paper to CHEP, who had been awarded a poster instead, and ascertain who actually wanted to go. The PMB would decide once they saw the list of people and sites. RM advised that the wLCG workshop was immediately before CHEP and that we usually funded this at 100%, including subsistence, for that event. The support went down to 50% for CHEP. ACTION 448.3 PG to establish who, out of the list of those wanting to go to CHEP, also intended to go to the wLCG workshop immediately before CHEP. 3. User Co-ordinator Position ============================== DB advised that GP had been in this position for some time now. He would be leaving RAL at the end of May 2012, therefore there was a need to consider the future role of his post. DB considered that it was an opportunity to think about the scope of the role and who might be the best person to take over. DB invited the PMB to think this over and contact him directly with thoughts/suggestions. We also needed to consider how GridPP positions itself beyond GridPP4. The role had also evolved over the years so we needed to look ahead at this point. DB noted that we could bring in new people, but it would need someone with a broad view and also time and interest to take it on. PG asked about RAL involvement - was the role at RAL mandatory/expected? DB noted no, this was not a RAL position, it was a GridPP position and was potentially possible at any institute. DB noted he needed inputs from everyone, ideas were required at this point, and it would be good to discuss this at a F2F meeting. ACTION 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 4. AOCB ======== DB noted that he had re-shuffled the Standing Items due to both constraints of meeting attendance for PMB members and the logical progression of reporting prior to the Tier-2 report. AHM Paper: DC requested a draft asap please, from RJ, GP, AS, JC, (and himself). Alice: Regarding Alice, DC had emailed Lee - was he still working with Alice? AS advised that he had called into the last meeting. STANDING ITEMS ============== SI-1 Dissemination Report -------------------------- SL had circulated an email report from Neasan O'Neill: 1) Website revamp will be done by end of the month/start of Feb. 2) Neasan should be attending the e-ScienceTalk Face2Face, next week. DB asked if Neasan could do a presentation report on this at the Manchester GridPP meeting? 3) Neasan had some news items to chase/check on but should be one up this week (as long as the ENROLLER work got done over the holidays) SI-2 ATLAS weekly review & plans --------------------------------- RJ advised that on the RAL side there had been network interruptions and an SRM problem last week; sites were slow to respond, but three would need to account for it at the next ADC meeting. The sites were: Durham, UCL, and Birmingham was problematic. For the other Tier-2s, they were switching to CVFMS - this was an issue for CMS in relation to the way that the cluster was configured, but they were trying a workaround. RJ noted there had been a lot of jobs processed over Christmas; scheduled downtime was happening soon and they would contact sites to advise. SI-3 CMS weekly review & plans ------------------------------- DC reported a network outage over Christmas; there had been an Oracle issue on 16th December, but all else was ok. The Tier-2s were doing fairly well. Bristol however was at 30%. SI-4 LHCb weekly review & plans -------------------------------- GP noted that things had been steady over Christmas; one disk server at the Tier-1 had been out for one day, then there had been scheduled interventions on Castor. There was steady MC production at present. SI-5 Production Manager's Report --------------------------------- JC reported as follows: The Christmas and New Year periods passed without a major incident. CERN reopened last Thursday 5th January. For global operations ATLAS reported that the grid ran smoothly with occasional Tier-2 problems not significantly impacting global production. There was a VOMS issue at BNL that affected Panada jobs on 3rd January. RAL and some UK T2s were affected by a change in Stratum0/1 configurations at CERN in December that led to an issue with latest software versions not being installed. Next Monday an ATLAS database migration to 11g means there will be no grid activity during 16th and 17th January. CMS did not report any major problems. Overall 60 tickets were submitted (globally) and over half were closed promptly. Analysis levels dipped over the Christmas period but Tier-2 availability remained good. LHCb also reported a good service over the holiday period though some issues with Monte Carlo merging jobs were seen at RAL around 1st January (possibly a failing disk server). New ROD tickets created over the period were set to expire on 4th January. Some hosts at Brunel were down from 27th December due to certificate expiry. Birmingham was affected by missing ATLAS DB release files from 23rd Dec. Lancaster was affected by excessive data transfers by T2K on 23rd December (Liverpool reported seeing an issue too so this needs following up) which consumed a lot of resources. Bristol experienced CE issues and was down from 24th to 28th December. Some sites declared themselves at risk over the period: Brunel (23rd-5th); Glasgow (23rd-5th); RALPP (23rd-3rd) and ECDF (24th-4th). There was a root vulnerability announced over the Christmas period (announced 24th December; EGI advisory sent on 26th December). The service affected was not found running on any CEs (on the expected port) and therefore did not require urgent attention, though there is an ongoing assessment of other potential impacts. For information: A) There is a GDB this week: http://indico.cern.ch/conferenceDisplay.py?confId=155064. Remote participation is to be via Vidyo. B) Tier-2 quarterly reports have been requested. SI-6 Tier-1 Manager's weekly report ------------------------------------ AS reported as follows: FABRIC: 1) FY11 procurements - Disk deliveries expected w/b 12th January (TBC) and 24th January - CPU deliveries expected w/b 16th January and 24th January. - T10KC media all received - Tape drives received 2) A number of incidents on the site network leading up to Christmas (two independent problems) but appear to be resolved and no further issues over the holiday period. 3) Site DNS upgrade went very smoothly (two servers remain to be changed - at our request). We expect the servers more critical to us to be upgraded Tuesday 10th - we do not expect any problems. 4) Repacking of ATLAS data to T10KC has been completed. We expect to keep LHCB and GEN on T10KA as long as possible - possibly right through 2012 depending on demand for the A/B series tapes. 5) Fabric team busy last week moving racks in the machine room in order to accommodate incoming deliveries. 6) We expect a lot of work in the machine room in February/March - hardware installations, electrical and cooling work and cold isle installation. Some increased risk of incidents. SERVICE: 1) Summary of operational for the week leading up to Christmas is at: https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-12-21 2) Holiday period operations were very smooth. Scheduled routine checks were carried out and on-call team made a number of interventions but generally no problems. Fabric team (Kash) attended on-site once (on the 2th) to resolve a number of hardware problems. https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2012-01-04 3) CASTOR The CASTOR ORACLE database servers were moved to temporary hardware as part of the planned migration to a new ORACLE configuration. The upgrade started late (after a fallen tree blocked the Wantage road early in the morning) but still just completed within the scheduled downtime. Generally the upgrade went very well, but there have been performance issues on the logging volume (which were not seen under load test). However this configuration is scheduled to be in place for just 3 weeks until phase 2 completes and moves ORACLE on to its final hardware configuration. STAFF: 1) Grid team leader post ongoing. 2) Recruitments underway * Two system admins for Fabric team - both expected to start this month. * One CASTOR admin - started (Rob Appleyard) * One Grid Team member - expected to start in next few weeks. * Keir Hawker's (Database team lead) last week this week (leave of absence). Richard Sinclair will be acting team lead. DB admin post advertised. SI-7 User Co-ordination Issues ------------------------------- Ulrich had spoken to GP about the Ganga Development Day at Birmingham, organised by Mark Slater. He was looking for funding for an Italian Developer to attend and contribute to the day (re SuperB) and had asked for around £120. DB thought this sum was fine as it was minimal, and it would be nice to encourage SuperB. DB would respond to him. ACTION 448.5 DB to respond to Ulrich about the Italian Developer attending the Ganga Development Day at Birmingham - £120 funding had been authorised. AS noted that he had been trying to get a response from D0 re their old file system but had received nothing back. They would be dropping the file system soon if D0 didn't respond - no authoritative response had been received, yet they had been trying to contact D0 for 9 months now. GP advised that in the past his contact would have been Gavin Davies, and he suggested that AS try and contact him. GP could do it, if it would help. DB asked if GP could follow this up? GP noted yes. AS would forward the email thread. ACTION 448.6 GP to try and contact Gavin Davies, on behalf of AS, to try and get a response regarding the imminent drop of the D0 file system. SI-8 LCG Management Board Report --------------------------------- It was noted that there was an MB taking place tomorrow. JG advised that he had been talking with Ian Bird prior to Christmas and that he would be giving up the position of Chair. They would be seeking a new Chair to replace JG in the Spring. Countries would be asked to nominate a Chair. DB commented that presumably there would be a bias against the UK following JG's tenure? JG agreed probably yes. PC asked if anyone were suitable or were there constraints? It couldn't be someone from a site. JG noted that PG/JC would be eligible to apply. JG advised that he would be on the search committee. REVIEW OF ACTIONS ================= 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. Ongoing. 438.2 PC to provide feedback and guidance about the data management plan following the CAP meeting on 4th October 2011. [PC will circulate something to the PMB before submission to STFC - RJ and PC are dealing with this.] PC had finished the CAP document now, which provides info to STFC re data policy. Inputs from RJ were still awaited. It was agreed that this action would be closed and a new action opened in its place: 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. No further info available at present. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. Ongoing. 446.1 PG to contact each PI individually re the DRI grants to ensure they understood they had to order/commit an equipment spend on their Institutes' systems before 31st March 2012. Done, item closed. 446.2 Re the DRI grants: PG to follow-up with PIs regarding evolution of plans and quotes in order to monitor spend progress. Done, item closed. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). In progress, being done. ACTIONS AS OF 09.01.2012 ======================== 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. 446.3 JG to put together a plan for the next joint GridPP/NGS management meeting in January (which would be followed by the first NGI technical meeting the next day). 448.1 PG to contact STFC again and discuss any possibilities regarding release of part of the funding, in order to allow procurements to commence at institutes, and also to check current approval status. 448.2 PG to check with all those who had submitted a paper to CHEP, who had been awarded a poster instead, and check who actually wanted to go. The PMB would decide once they saw the list of people and sites. 448.3 PG to establish who, out of the list of those wanting to go to CHEP, also intended to go to the wLCG workshop immediately before CHEP. 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 448.5 DB to respond to Ulrich about the Italian Developer attending the Ganga Development Day at Birmingham - £120 funding had been authorised. 448.6 GP to try and contact Gavin Davies, on behalf of AS, to try and get a response regarding the imminent drop of the D0 file system. 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy. The next PMB would take place on Monday 16 January at 12:55 pm.