GridPP PMB Meeting 512 (16.12.13) =============================== Present: Dave Britton (Chair), Pete Gronbech (Minutes), Andrew Sansum, Jeremy Coles, Steve Lloyd, Tony Doyle, Dave Colling, Tony Cass, Dave Kelsey, Roger Jones Apologies: Pete Clarke, Claire Devereux 1. The Collaboration Board (CB) meeting ======================================== It was considered unnecessary to present the full talk that was given at the recent PMB F2F. The CB would be asked to endorse the decision that, if we were forced to choose, then we would only provide a Tier-2 service in the UK. The CB decision had to be based on correct reasoning - the location(s) of the Tier-2 service was a separate issue. DC asked if STFC cut GridPP by 50% and we could not provide both Tier-1 and Tier-2, what would break? We would lose the leverage from the universities if we relocated or removed the Tier-2 service. It was felt that this was a second order issue. LHCb relied on the Tier-1 and would not support this decision. As they could get plenty of CPU from other sites, losing the T1 would be a disaster. There were a whole range of impacts related to losing a T1, Atlas and CMS would also not be happy. RJ agreed. Were we agreed that DB would ask the CB to endorse this as the strategy if we were only to be funded at the £3.5M/year level? PC considered that it would be wrong to ask them to endorse this without a paper being submitted, or a pre-warning. Perhaps a second meeting would be required. Support from the CB would be required in order to go ahead and draft a plan on that basis. It was also clearly difficult to ask for approval of a plan where a vested interest was involved. PC advised that the advice received from experiment reps indicated that they were likely to choose the Tier-2 solution, mainly due to Atlas and CMS support. We would require another CB meeting or email exchange early in the New Year. STFC would present it to Science Board to decide. How would we cope with a £3.5M funding level? We trusted the institute leaders to have the experiments' best interests at heart. SL read out the definition of the CB. PC noted that the GridPP Project's job was to understand and provide what the experiments required, not to provide jobs at all sites. It was not about the institutes, rather it was about the functionality. DB did not intend to send out the slides. PC considered that once Swindon knew about the choice between the T1 or T2 they would ask for a minimum viable cost to include both. DB already had this – which is why we knew that a £3.5m/yr budget would mean either one or the other. Even at that level, it was not enough for a robust Tier-1 service. There was a programme for LHC physics and there was a required amount for computing - if they voted for the physics then they would have to provide the computing. The £3.5m/yr budget was not a feasible option. STANDING ITEMS ============== SI-0 Report from Development Group ----------------------------------- DC advised that no report had been prepared - there was lots going on, he would email round a report. There was activity in CMS, ATLAS and LHCb, they had upgraded client to Havana. A cloud resource could be made to look like a grid resource. This was a new setup (AKA Stealth Cloud). DB had sent a link to a paper on cloud provision at Tier-2 centres to DC on 11.12.13. This could be a useful input to our proposals. SI-1 Dissemination Report -------------------------- SL presented the Report from Tom Whyntie: > News Item - Big Data on the BBC A recent BBC Radio 4 documentary "Data, Data Everywhere..." featured the LHC and the huge computing effort required to find the Higgs boson. The news item may be found here:http://www.gridpp.ac.uk/news/?p=3108 > A Collaborator's Guide to GridPP on the GridPP website Alex Efimov has produced a PDF guide for potential collaborators. This can be found at: https://www.gridpp.ac.uk/wider/ > TW to Guest Curate "Science Showoff", 15th April 2013 TW has been invited to guest curate a "Science Showoff" event on the 15th April 2014, with a working title "Big Data, Big Deal". Science Showoff is a charity event where scientists are encouraged to give a 9 minute talk - in any format - about their research. If anyone wants a slot, let me know - TW is be going with the Data Exploration/citizen science theme. Further information about Science Showoff may be found at http://scienceshowoff.org/ > CERN@school and CVMFS Catalin Condurache (RAL) has now enabled CVMFS for cernatschool.org at RAL and a test tarball has been created for deployment a simple ROOT-based executable. Further tests to follow this week. - Regarding the project with Alex, Tom was trying to find out about the bit-splitting work. They had finally got it working, but it was difficult to get it working on Linux - this would require a week’s solid coding from Simon. Did we want Simon to spend this much time on this, it was not our top priority? Why should we be doing this. DB agreed. DC to email Alex and cc Tom. ACTION 512.1 DB to email Alex and cc Tom Whyntie regarding Simon's time on coding for the bit-splitting work on Linux. DC advised that he still had not been paid for the journal. It had happened before he could get college to pay for it. The amount was £1680. Apparently we said DC would try to get funds but otherwise it would come out of travel funds. Could DB try to get funding from his libraries? Money went to 23 universities about a year ago but may have been spent now. The Publication was in 2012. The Royal Society were looking for payment. It was agreed DC should pay this and claim it back. - Tom went to see a 3-man SME, ‘python anywhere’, which was ideal for schools as it was in a web browser. They rented time on amazon. Tom wondered if we could offer resources to help this? It would need an EC2 interface (which DC had), how do we do due diligence on what is being run? So long as it had a limit it would be acceptable. It was potentially interesting work. - PG had given a GridPP talk to IATUL. SI-2 ATLAS weekly report & plans --------------------------------- RJ noted not much to report. There had been an incident concerning inappropriate use of resources, which had been dealt with rapidly, the user had admitted to it and had been admonished. The Panda systems alerted them to odd behaviour. The person’s certificate has been revoked. From the RAL end, the ATLAS response had been excellent, reflecting well on site security. They did report to higher management but nothing had come of it. RAL was now scanning logs to see if any other inappropriate workloads had been run. DK reported that the incident was closed. Operationally things went very well and AS thanked Atlas. SI-3 CMS weekly report & plans ------------------------------- DC noted not much to report. They were planning an exercise next year. The Ops SAM tests had changed. Fair share policies were affecting new SAM tests. (previously SAM tests were based on OPS, which sites tended to make a small reservation for to ensure they did not get blocked). WLCG was not entirely ready for SHA2 certificates. This had been postponed to January. The DPM collaboration workshop hosted in Edinburgh last week had gone very well. For November Tier-2 availability, there had been four sites below target: UCL (downtime due to SE/WN upgrade); Durham; Birmingham; Sussex. RALPPD SAM jobs had been stuck due to fair share issues. The site would be 'at risk' over the Xmas break as normal. SI-4 LHCb weekly review & plans -------------------------------- There was nothing to report. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) There was a GDB last Wednesday (http://indico.cern.ch/conferenceDisplay.py?confId=251192). The most discussed item related to the move to using experiment SAM tests for WLCG site availability/reliability reporting and issues seen whereby the test jobs get ‘stuck’ due to fairshare policies. SHA-2 readiness was also reviewed – the infrastructure overall is not yet completely ready; the French CA (at least) is likely to issue SHA-2 certificates by default from this week. In the UK we postponed the switch to next year – possibly now March. 2) The GridPP hosted DPM collaboration workshop took place in Edinburgh last Friday – thanks to Wahid Bhimji who organised it. The event was well attended and received. The efforts of the collaboration members mean that the DPM product remains a core component at many sites and its future looks more certain now with a good selection of new interfaces in development or test. GridPP is making a solid contribution to the work - there is also increased participation from other countries compared to when the collaboration started. 3) The November WLCG Tier-2 availability/reliability report is now final: http://indico.cern.ch/conferenceDisplay.py?confId=251192. GridPP sites under the targets were: UCL (28%:54%): Downtime/impacts related to SE and WN upgrades. Durham (71%:71%): Downtime due to campus wide power maintenance. Birmingham (89%:89%): Submissions stopped due to a CE issue requiring the node to be rebooted and the situation covered a weekend. Sussex (58%:58%): Issues remained following the SL6 upgrade. RALPPD reported one issues with the experiment test results (currently run in parallel to ops) that is being pursued in a ticket: https://ggus.eu/ws/ticket_info.php?ticket=99319. 4) Experiment plans for running over the Christmas period have been mentioned in a number of forums including the WLCG ops coordination planning meeting: http://tinyurl.com/mx3oq8y. All the experiments understand (and are grateful) that support during this period will be on a 'best efforts' basis. SI-6 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) First CPU delivery just arrived this morning. Second CPU delivery and two disk deliveries scheduled for January. 2) Uplifted tape media order placed. The cost for tape media £40k (for t2k) was raised to £180k, the order had gone out and should be delivered in January. The price on the new framework was ~15% better than before. 3) We are having to consider the rapid disposal of part of the 2007 generation of hardware owing to constraints on machine room floor space. Will email separately. 4) A generator load test was carried out successfully last week. We are discussing what the appropriate test interval is with estates. Hopefully it will revert to 3-monthly intervals. Service: 1) Reports covering last week available at: https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2013-12-11 2) CASTOR a) Work continues on CASTOR 2.1.14 testing. A number of issues have been identified and updates received from CERN. b) ATLAS file renaming is so throwing up .004% missing files. We don't have enough logging information to understand what is the cause of the file loss. Not necessarily a local data retention problem. Staff: 1) We are beginning recruitment of a 1 year contractor position to work on cloud infrastructure. SI-7 LCG Management Board Report --------------------------------- There had been no meeting. REVIEW OF ACTIONS ================= 496.2 PC to update the network forward-look. This was close to being started, but was waiting for input from RJ and DC. DC said that the position is evolving in CMS. RJ hopes to have a look at this later this week. PC does need half a page from each expt to set the scene. Remote access to data is scaling well and dependant on how well this works the bandwidth required will change. PC asked that RJ/DC to note down what they expect but say it may change and they will inform JANET if so. If there is a Tier-2 site that requires a better connection to JANET the experiments must say that. 511.1 AS/DK to do the outturn forecast, look at the possible spend on tape media and advise Tony Medland about the profile for next year. A realistic outturn forecast for travel was also required. Action closed and replaced. 511.2 CD to discuss GridPP's input with the UK NGI concerning interest in the Distributed Competence Centre. JC says CD did raise the issue and the UK are on the list. So probably complete. ACTIONS AS OF 16.12.13 ====================== 496.2 PC to update the network forward-look. 512.1 DB to email Alex and cc Tom Whyntie regarding Simon's time on coding for the bit-splitting work on Linux. 512.2 Regarding the outturn forecast and the possible spend on tape media, travel etc, DB/PG to work out what was left and ask Tony Medland for re-profiling. Next PMB: Monday 13th January @ 12.55pm