JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  October 2013

UKHEPGRID October 2013

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 504th - 506th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Sat, 19 Oct 2013 14:25:55 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (53 lines) , 130909.txt (1 lines) , 130923.txt (1 lines) , 130930.txt (1 lines)

Dear All,


Please find attached the GridPP Project Management Board
Meeting minutes for the 504th - 506th meetings.

The latest minutes can be in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.







































GridPP PMB Minutes 504 (09.09.2013) =================================== Present: Dave Britton (Chair), Andrew Sansum, Roger Jones, Tony Cass, Jeremy Coles, Pete Gronbech, Pete Clarke, Dave Colling (Minutes - Suzanne Scott) Apologies: Dave Kelsey, Tony Doyle, Claire Devereux, Steve Lloyd 1. GridPP32 ============ Possible places for GridPP32 were mooted. It was noted that DELL were very keen to sponsor. 2. Quarterly Reports (Q2) ========================== PG reported that these had all been received and he would circulate a written report. Overall Q2 had been a good quarter with many green metrics. The Tier-1 had a good quarter but was below on a couple of targets and a lowered staff count still. AS advised that this should increase soon. DB noted that it was important not to underspend in this part of the budget. PG reported that at the Tier-1, preparation for upgrades was proceeding and the generator testing was now stable. For ATLAS/CMS it had been a quiet quarter with not much to report. There had been a few problems with ATLAS at some sites and there had been issues at Glasgow. For LHCb all metrics were green and it was generally smooth-running. For 'Other' Experiments, there had been a healthy usage of the Tier-2 at 14.5%. Manchester had to alter their fairshares to reduce usage. For DataGroup all was green, there had been a good workshop at Imperial; they were upgrading to SL6. Support for Storm and dCache was ongoing. AS asked whether we had tagged the delay of the SHA2 deadlines? JC replied that the EGI request to EUGridPMA had been mentioned to site admins at the ops meeting last week but a final decision was not expected until later this week. CD had provided a report - all was green on the NGI front. Work from the APEL and GOCDB groups was reported. There had been progress on cloud accounting. Support for the ARC-CE was ongoing. The EGI Community Forum at Manchester had been a success. There was lots of work going on re the GOCDB and the new release of v5. For Experiment Support, all support was looking green. Re Outreach, Neasan O'Neill had left and they were currently recruiting. In the 'Execution' section we were around 5 members of staff down at the moment due to losses at sites. PG reminded everyone that the next Quarterly Reports (for Q3) were due soon. Could these please be on time! 3. GridPP31 ============ DC reported that he had updated the Agenda pages on the website. - the main meeting was being held in lecture theatre 2 - refreshments would be served in the Foyer of level 2 - rooms had been booked in the Hotel Copthorne Tara, details were on the Agenda page. This was around a 15-minute walk from Physics, a map was given. - details of the indian restaurant were on the webpage - 3 people were attending from Viglen - Eduroam and guest accounts would be available DB asked if anyone would be doing a 'welcome' speech? DC would welcome everyone or get the Head of Department to do it. DB advised that in the Opening Session he would do a talk about challenges and status. He had asked Ian Bird to speak but this was unlikely. Regarding the Discussion Sessions, experiment requirements needed to be clarified and also how we would meet them. DC noted he would contact RJ and PC. DB asked DC to add a description onto the Agenda page. PC advised that such a discussion would need to take account of the new computing model update document from wLCG, which was currently being circulated. TC advised that this was only in circulation among the experiment co-ordinators at present. DC noted that the document hadn't been written by the experiments. PC disagreed, saying yes, it had been - LHCb and ALICE had written theirs and the document was awaiting inputs from CMS. DB noted that this document should shape the headings for the discussion sessions. RJ advised that the document he had been sent was 'for comment'. TC agreed, noting that it would be sent to the MB following experiment feedback. DB advised that after Session 3 there should be a 'logistics' slot regarding the Dinner, and also a slot for the Sponsor. Session 4 would comprise a forward-look regarding the Tier-1 and CASTOR. AS considered that the challenge was to know what the forward-going requirements were and how the Tier-1 would structure itself. DB thought maybe 2 x 0.5hr talks and half an hour discussion would work? AS said he was working on it and could discuss the batch system upgrade. PG considered that it would be worth mentioning the results and conclusions of any Tier-1 testing, in order to disseminate this information to the collaboration. This was relevant to what sites were doing on the ground at the moment, especially in relation to the batch system. DC thought it would be preferable to separate any such update from the GridPP5 forward-planning. DB thought this kind of report-back might be better in the joint PMB/Ops Team meeting? Or it could be in Session 1. AS would finalise his proposal by Wednesday afternoon. Session 5 was a discussion session - SL had ideas for that. DK was doing Session 6. Session 7 was not required. DC asked about DELL attending - it was agreed that there was no problem about DELL being there during the meeting to listen, but they should respect Viglen as the Sponsor. 4. XiPi ======== AS reported that JG had contacted him about XiPi portal. AS had asked around for any information on this company - he had access to the Tier-1 entry on it. He could updated the Tier-1 entry description and then publish. The company had some sort of agenda regarding resources but AS wasn't sure, it was all a bit of a mystery at the moment. DC asked if there were any advantage to us? AS noted that other user communities using our resources would be the level - it may be a small funded project, he would have to wait and see. 5. Tier-1 procurement & resource update ======================================== AS advised that this had been incorporated into the tender re STFC funding - extra capacity had been added-in. The tender closed on 6th September but the supplier needed more time to benchmark, therefore the deadline had been extended to 27th September. This had to go to BIS - DB had written to BIS explaining about the funding for GridPP and how it was managed, also how the hardware procurement worked at RAL. BIS had decided that it was none of their business therefore we didn't need to submit the tender for BIS approval - this was good news. AS had written to SSC - they were therefore going to proceed with the procurements as originally planned. 6. BDII problems at RAL ======================== PG had asked AS to advise on the BDII problems. AS reported that the BDIIs were problematic but this was not showing up. A few weeks ago while it was updating, it stopped serving to clients occasionally and we lost BDIIs on a few occasions. The frequency of database updates happens every 20 minutes now and they had not seen a recurrence of the difficulties. DC noted that query times had risen significantly recently. AS noted there was a workaround in place at present and as yet, no solution. DC asked if this was a common problem? JC noted yes, reports of problems from across EGI were being submitted following a recent top-BDII update. Imperial also saw problems and followed up in https://ggus.eu/ws/ticket_info.php?ticket=96667. 7. DELL sponsorship ==================== DC advised that DELL were keen to sponsor the next Collaboration Meeting and had hinted that they would be prepared to be quite generous. DC had given them cost estimates. DELL hoped to re-introduce the purchasing portal as before. A common configuration would be required. RJ advised that the special pricing still existed, it was just that the portal didn't. DC noted that the people he spoke to said that it did. RJ disagreed, he would talk with the individual who had negotiated it in the first place, and also talk to the Head of UK Sales. DELL central made the deal but they needed to liaise with DELL UK. RJ/DC would take this forward. ACTION 504.1 RJ/DC to contact DELL and establish facts about special pricing and the portal availability. SI-0 Development (Cloud) Group Monthly Report ---------------------------------------------- DC noted there had been no meetings in August. The next one was due next Friday. He would report after that. SI-1 Dissemination Report -------------------------- DB reported on behalf of SL, as follows: they were currently recruiting for a replacement for Neasan O'Neill. They had interviewed and made an offer to an internal candidate however this fell through and the post was now being externally advertised. There were two promising applications on the system so far. There had been contact with Alex Efimov on KE and a meeting would take place between SL/AE and someone else from the Cambridge/Kazakhstan Development Fund. They wish to test some technology and it was agreed to try this at Imperial first. PG could not find the real company on the web, so it was recognised there was a risk here. DC thought it might work technically, but they would look at it and see. DB suggested keeping the project relatively confidential for now. GridPP had been mentioned in a 'Nature' article; a further news article was being written. SL had suggested updating the GridPP posters - any comments please let him know. PG noted that Chris Walker had sent him a link regarding Grids in The Telegraph, he would circulated this. SI-2 ATLAS weekly review & plans --------------------------------- RJ reported that there had been a big Tier-1 to Tier-2 disk cleanup as they were running out of space. They were still finishing-off on the renaming of files at RAL due to the new management system. SI-3 CMS weekly review & plans ------------------------------- DC noted things were quiet, they were continuing the move to separate tape and disk at the Tier-1. SI-4 LHCb weekly review & plans -------------------------------- PC was absent. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) The UK CA is ready for SHA-2. We have also seen good progress with services being updated across most GridPP sites, however, delays elsewhere in EGI (and late releases for dCache and Storm) mean that a proposal to move the default SHA-2 usage to the end of December is to be put forward. The GridPP website is not currently compliant but will be updated in the coming weeks. 2) ATLAS no longer use ATLASHOTDISK and the spacetoken is being decommissioned across sites and the space used for ATLASDATADISK. 3) A major update to GOCDB (v5) was due for 2nd September but has been postponed until late September due to development needs and updates of associated hardware. 4) A DPM Community workshop is being planned to take place in Edinburgh in early December (the PMB approved funding via email since our last meeting). 5) Sites are not moving to SL6 (deadline 31st October) as quickly as expected. Some upgrades are being planned to coincide with EMI-3 upgrades in October (for the overall plans see https://twiki.cern.ch/twiki/bin/view/LCG/SL6DeploymentSites#UnitedKingdom). Likewise for glexec enablement for which the tentative deadline is also October. 6) Sites are in the process of upgrading to a new perfSONAR release. This new release fixes problems with the WLCG mesh (e.g. traceroute and pingER). A new dashboard is also expected shortly. 7) Our Nagios monitoring has had periodic glitches that seem to be traced to problems related to top-level BDII and WMS updates. 8) We are likely to see the escalation of some UK Regional Operator on Duty (ROD) tickets to COD. The outcome of this being a message that the UK ROD is not performing correctly. This is a direct consequence of the agreed use across EGI of marking SHA-2 tests “critical” in the monitoring even though we knew that many sites would not be addressing the issue until upgrades after 28 days (and the update not actually being “critical”). Our ROD is performing well! 9) I have only received a couple of requests to attend the WLCG workshop in November (thttps://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=251191). Similarly for the HEPiX meeting in late October. Of interest: A) There is a GDB at CERN this week. There is a rather short agenda (http://indico.cern.ch/conferenceDisplay.py?confId=251189) covering experiment news and operations coordination group updates. SI-6 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric ------ 1) Closing date for disk and CPU tenders has been extended to 27th September. BIIS have advised us that no approval from BIS?Cabinet office is required for these tenders.  Service ------- 1) Reports covering last 3 weeks available at: http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2013-09-04 http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2013-08-28 2) CASTOR a) Occasional low level rate of transfer failures from CASTOR in recent weeks. Team investigating b) ATLAS instance provided poor availability over the weekend (77%/90%) and we failed ATLAS transfers and were ticketed. We are still investigating cause but impeded by lack of logging over critical period (cause unknown) 3) Batch farm Good progress on testing of HTCondor/ARC CE. New farm has run a considerable amount of work successfully. Have to provide CREAM CE for Alice but others using ARC. We have agreed we will migrate to HTCondor and ARC but we may not be fully migrated to HTCondor by end of October when SL6 upgrade is required. Therefore are also preparing a change to upgrade existing farm to SL6/Torque MAUI. This will allow us to have a fallback plan if HTCondor completely unsucessful and will allow us to upgrade residual farm to SL6 if migration not completed. Agreed today to increase capacity in HTCondor farm ASAP this week to exceed 50% of job slot count at RAL. This will build experience at larger scale. Will review deployment plan again on Friday in light of this week’s operational experience. 4) FTS testing continues after recent patches Recruitment ----------- 1) In process of recruiting a Year in Industry student to replace Kashif (our hardware expert) who successfully applied against the fabric team system admin posts 2) New CASTOR Post holder started today. SI-7 LCG Management Board Report --------------------------------- There had been no meeting. REVIEW OF ACTIONS ================= 496.2 PC to update the network forward-look. 500.3 AS to send details to DB regarding the RAL Tier-1 kit available for retirement, to enable DB to write to institutes, following which we would decide how to proceed. Ongoing. 500.4 CD to do a cost/benefit analysis for the services GridPP currently provides (in the context of a possible bid to continue some services post-EGI). Ongoing. 502.4 DC to provide a paragraph of text to PG regarding the 2012 year report: experiment reporting milestone. Ongoing. 503.1 DB, as PI, to send an email to SSC explaining the GridPP grant situation regarding procurement expenditure and that we deliver to a cash limit; this was not STFC expenditure therefore we should be exempted from the SSC formal procedure. Done, item closed. 503.2 DB to add all of the comments received from RJ/PG on the GridPP booklet, plus amendment of the 2% figure to more generic text, and respond to Alex Efimov. Done, item closed. 503.3 AS to thank Pete Oliver for agreeing to help re the HAG Chair (but we needed someone accountable to the PMB for the role). Ongoing. 503.4 DB to ask DK if he would Chair the HAG. Ongoing. ACTIONS AS OF 09.09.13 ====================== 496.2 PC to update the network forward-look. 500.3 AS to send details to DB regarding the RAL Tier-1 kit available for retirement, to enable DB to write to institutes, following which we would decide how to proceed. 500.4 CD to do a cost/benefit analysis for the services GridPP currently provides (in the context of a possible bid to continue some services post-EGI). 502.4 DC to provide a paragraph of text to PG regarding the 2012 year report: experiment reporting milestone. 503.3 AS to thank Pete Oliver for agreeing to help re the HAG Chair (but we needed someone accountable to the PMB for the role). 503.4 DB to ask DK if he would Chair the HAG. 504.1 RJ/DC to contact DELL and establish facts about special pricing and the portal availability. Next PMB ======== PMB F2F - Monday 23 September @ GridPP31
GridPP PMB Minutes 505 F2F (23.09.2013) ======================================= Present: Dave Britton (Chair), Andrew Sansum, Roger Jones, Jeremy Coles, Pete Gronbech, Dave Colling, Claire Devereux, Tony Doyle, Dave Kelsey, Pete Clarke (Minutes - Suzanne Scott) Apologies: Steve Lloyd, Tony Cass 1. Tier-1 Staffing levels ========================== AS reported that he had been checking with the teams and they may not need the effort outlined in the GridPP4 plan - they were unlikely to recruit any more people. AS recommended adjusting the plan, and presented slides. - some databases were lower priority now, 1.5FTE covered it (down from the expected 2.2FTE) - CASTOR was easier to run than it used to be, the facilities' work had borne fruit - overall the CASTOR effort was 3.5 but GridPP only needed 2.8FTE - the original target was 19.5 for GridPP4 but they probably only needed 18.1 now - SCD was prepared to support GridPP5 at a similar level to GridPP4 PG asked what this slide meant? AS responded that this was the STFC Scientific Computing Department matching effort. AS proposed we should adjust the metrics down - what about released effort? DB noted the issue of underspend at RAL which had been raised as an issue by Tony Medland. The Oversight Committee (OC) wanted a Resource Report to confirm the view that there had been a significant underspend at STFC - it was not clear whether the underspend was good or bad and we had to know what we needed. AS advised that he could look for a 1-year 1FTE post in order to inject effort, or could release this for GridPP5. DB thought it was a balancing act, an underspend was not good at higher-level strategy - we needed to ask STFC at the OC what they wanted us to do - perhaps pay for the STFC effort? AS thought that if we wanted to justify the staffing shortfall we could do so with CASTOR effort over time. DB considered there could be a spike in future if we replace CASTOR - we needed to expose the issue at the OC and seek advice as to what they wanted us to do. AS advised that the bulk of the shortfall was in the CASTOR database and he could explain this; SCD contribution had a historical basis, giving 0.5FTE to Robin Tasker for site networking support. AS noted they were also doing electrical pricing at the moment. PG considered that we needed a legitimate reason to give staffing away that we might need in GridPP5. PG would get a draft of the document to work on. 2. Tier-1 Hardware Resourcing ============================== PG reported that in total we received £370k re purchase of hardware in the next round. The case had been made for upgrade-related hardware for ATLAS and CMS; some hardware for the other experiments (driven by T2K requirements); and a boost for LHCb Tier-1 hardware to bring it on to par with the ATLAS and CMS fractions DB reported that TM had raised the issue of LHCb disk, and he wasn't very convinced about it. There was the warning that times ahead would be challenging and this was not the right trajectory. DB advised that if we got a 50% cut in funding, we simply couldn't afford to do it. DB expected the concern to be raised at the OC. PC agreed, however he considered that TM's reservations were unfounded. LHCb had survived thus far simply because they had to and binned copies of data in order to do so. PC confirmed that the requested percentage would be filled - they had put the numbers into CRSG for 14PB in 2014, rising in 2015. In 2014 they were doing reprocessing back, however they could delay this. LHCb were still fighting the inconsistency problem in the UK and lack of coherence in the pledging system. This wouldn't be ratified by the RRB until October. DB considered that we could choose to provide the UK fraction of the 14PB at the Tier-1 if we wished, or by way of 11.5 at the Tier-1 and 2.5 at the Tier-2, however countries could provide all at the Tier-1 if they wished. Things might be different in GridPP5 however and it was noted that LHCb would like Manchester and RAL to be Tier-2Ds. DB thought it wasn't helpful to show the OC the table as presented - we needed to discuss the percentage of local pledges with the OC, which was more useful. Regarding GridPP4 we might be able to put some of this at RALPPD. AS considered that we needed to finalise all plans first, but there was certainly no complication at RALPPD. DB thought that Manchester might be ok too - this was something we might be able to manage. DB noted we had to wait regarding the funding issues before we could decide mapping issues, and these had to agree with the REBUS fractions - we couldn't stray from the algorithm. A provisional figure of 14 (11.5/2.5) stood for the moment. PC noted that the likely request for 2015 was that 11.5 would rise to 12.5, and 14 rise to 15 - they could stage what they did across the 2014-15 fraction - overall, 31.5% of the Tier-1 fraction request and 21.5% of the Tier-2 request (excluding CERN). These were the M&O numbers from June. PG noted that these percentages had implications for Tier-1 hardware planning, also for the CRRB. DB reminded that the pledges were due at the end of September. 3. Tier-1 Spend Plan ===================== AS had circulated the final plan as adapted after the official capacity was added, showing CPU capacity; disk; tape (including extra tape, networking, infrastructure, maintenance/misc); non-capacity hardware. AS needed to see the outturn forecast. DB asked if there was headroom? Was 1.62 still ok? AS noted yes, he could adjust the balance once he knew the pricing. PG asked about the HAG Chair? DK had agreed ok, so the open action was done. PG asked if, once we got the prices in, did we need a HAG meeting? AS considered that it depended on how we dealt with the timescales - we would get the figures, make the estimates, the HAG often agreed afterwards as all was on a 'best guess' basis. DB advised that the HAG ensured that if choosing between 2-3TB, the right technology choice was made. AS noted he was hoping to order by the end of October. The meeting broke for lunch. 4. EGI Council Report ====================== CD reported on the recent EGI Council meeting. - Governance Structure Steve Newhouse had resigned and was leaving at the end of October. This would mean that the EGI Inspire Project Director's role would be transferred. Leadership of the project would pass to Tiziana Ferrari (current Chief Operating Officer at EGI Inspire) and management of the organisation would be passed to Catherine Gater(current Deputy Director at EGI.eu) as interim Director until the EGI Council made a permanent appointment. The Search Committee would process the vacancy until 7th December and report-back with a shortlist to the 17th December meeting. The post would be advertised for one month and was a fixed-term one-year appointment in the first instance. They would use the next few months to reflect on these roles, whether they will be split in future and also whether they wanted a 'visionary' or 'managerial' candidate. - Budget The budget was adopted for the next financial year; there would be a reduction in staffing from 27 to 22FTE, and in the longer term beyond 2014 a further reduction to 9 core (EGI.eu) staff. Voting would be deferred until the next meeting in October - Germany had refused to pay their Fee. At present the UK was undergoing a review regarding the Fee, instigated by JISC, but there was no progress as yet. Most other countries would be paying the 2014 Fee, except Spain and Italy who were reviewing this at the moment. The participation fee was 1.6 million Euros across all countries. This was used to fund Core Tasks and the central organisation of EGI.eu (there would be 9FTE at Head Office). The EGI grant was due to finish at the end of April next year and there would be at least a 6-month funding gap to Horizon 2020. There may be two closing dates. EGI could support Core Tasks using the participation fee for 6-8 months only. Regarding the budget, voting was deferred, and it was hoped that Spain and Italy would discuss offline. DB noted that the review of UK involvement run by JISC would likely be that EGI benefits STFC. CD advised that, during budget discussions, it had been suggested that the current solution be discarded, and a one-country one-vote structure be instigated - which would mean that all pay the same fee. This would be easier all round. Currently, the UK gets 70 votes and a veto, whilst small countries only get 4 votes. Small countries might find it difficult to pay an increased flat fee. Large countries like a larger vote as they can use their veto. CD noted that we should have a view on this as the governance was changing. DB noted that we would be in favour of a significantly reduced fee - having less say wasn't so important and we'd favour that model. CD reported that the notice period for Germany to withdraw had been reduced from 3 months to 2 months. The Core EGI Services bids were postponed until the next meeting as the budget was not yet determined. Re the APEL Repository - there was 12 person months per year of effort (including 2nd line support) and the bid was for 2 years. For the GOCDB this was 6 person months per year. For Security the co-ordination was for 6 person months per year. The 'Grand Vision' document had been endorsed. The EGI Science case had been written by the projects and the community re the 'grand challenges' of the future - could the experiments contribute? PC noted that the wLCG Computing Upgrade document could be utilised. DB agreed, noting this could be used for EGI to apply to Horizon 2020? CD noted yes. A small document was required, around 5 pages, and it was in our interests to provide inputs. DB suggested we did this, we already had the information available to re-package. PC asked what the format was? It was split into sections: - scientific case/challenge - usage & activity - future e-infrastructure - challenge & plans It was agreed that the experiments would do this (PC/RJ/DC) - CD would feed that back. 5. Preparation for GridPP Oversight Committee (1st November) ============================================================= TM had called DB to discuss this. The possibility of a rather dire funding situation had been warned in advance of the OC. Another OC would take place in February 2014 which would advise on the GridPP5 proposal. They were not expecting to receive a detailed outline of GridPP5 on the 1st of November meeting, rather they had requested: - Status Report of GridPP4 - Resource Report (TM had noted the underspend at RAL) - Forward look document There were therefore a maximum of three papers to present. PG would do the Resource Report. The Status Report was partly the Project Map (PG) and based on what we had provided in the past. Regarding GridPP5, we would be able to pull-out the conclusions of the upcoming Collaboration Meeting, GridPP31. The OC meeting would be useful at this stage - Jonathan Flynn also chaired the C-RSG. It would be an opportunity to correct TM's perception of the LHCb situation. The latest version of the wLCG draft document was 1.7? Regarding the Tier-2 Hardware funding grants, had STFC given guidance? PG noted that the overall plan was to fund the Tier-2 hardware grants within the next FY. Three sites had however expressed a preference to have the funding this year: RHUL, Brunel, Birmingham. If we were to do that, we needed to move quickly to close the accounting and do the allocation/spend before the next FY. The remainder of the sites would get grants next year. DB noted that the funding this year was £300k - we could stop the accounting as at 1st October, work out what they get, then run it on and the rest of the sites would receive an allocation from the integrated percentage across the period. TM was happy to spend £300k this year for three institutes. PG noted that if we wanted to do this, we needed to do it quickly. DB agreed, noting that we would decide on 1st October and do an evaluation about the fraction. JeS forms were underway and should be co-ordinated between the PIs and STFC. We needed to ensure that the institutes really could spend the money this FY. It was agreed that no bad month should make any difference to the allocation, it was over a sufficiently long period. There ensued a discussion about matched university funding for hardware grants. TD noted that the algorithm was based on what you get from STFC, institute funding was extra. DB observed that if the STFC award was as capital then we had to spend it as such, not treat is as 'resource' as well. JC arrived for the remainder of the meeting. 6. GridPP5 Planning ==================== PC asked if there was any sense of the level of award this time round? DB advised that at top level in the past we had a sense of what was possible and we had managed to submit an appropriate proposal, following which we received a request to de-scope it. DB considered that the situation was different this time, as the landscape had shifted - the Higgs had been 'found' and the trajectory was now LHC exploitation. STFC might be considering an annual award. We definitely could not de-scope at the 30% level the way we had done previously. DB believed there was a real possibility of significantly less money, however this could mean a 'flat cash' scenario. Something may have arisen in the Programmatic Review and a large part of the GridPP budget related to staff costs. DC thought it might just be a timing issue? DB noted there was an STFC Town Meeting scheduled for 14th October for all people involved in the Programmatic Review. It appeared that this meeting had been re-purposed and it might be that there was no news as yet to publish. TM had informed him that the OC would give advice about the revised schedule at the OC meeting on 1st November. DB considered that we should proceed optimistically but be realistic about current economic constraints. STFC were unlikely to publish the results of the Programmatic Review until December/January. DC asked what we could usefully do then about GridPP5 planning? DB noted that GridPP31 would focus on experiment requirements on Day One, Day Two was the base case for the Tier-1. The Tier-2 discussion was the most complicated part and we needed a mandate from the community to carry on as we are and to continue with a distributed computing model. The discussion had to revolve around - if we had to change things, what would we do? JC asked whether there was a danger saying we would focus on development? DB agreed, noting that if funding was tight we needed to focus on running an operational infrastructure. DK noted however that it was the development work that got us good people. PG asked about new sites that had just come online like Sussex, with a small but keen group? DB considered that we needed to make the case in the bid and that resources needed to cover 'greenfield' and evolutionary development, however the further away we got from our core business, the harder it was to argue for things like developing DPM etc. DK noted that the success of GridPP was in conjunction with experiments and that operational work has until now gone hand-in-hand with development. DB emphasised that we needed to do what the experiments wanted us to do - it was a partnership. RJ advised that they couldn't make any realistic estimates for resource or anything else for GridPP5 - we were too far away at the moment. By 2019 they could be running stuff on ARM processors. PC agreed, noting that it was very hard to write down stuff that we don't yet know - but we do understand review panels and could position ourselves accordingly. PG noted that we had been lucky to get hardware at competitive prices in the past, but in the future, PCs might go and X86 hardware could be too expensive - could we build an ARM-based CPU currently? DC thought we needed to build things in parallel - have different levels of memory. DB advised that this issue was too technical at present, but in our proposal we did need to write something very complete and see how it went. 7. GridPP32 Planning ===================== DB had circulated a proposal regarding Pitlochry as a possible venue. The quote we had been given was under £10k from the hotel, with a very good accommodation rate. There ensued a discussion about travel arrangements and other possibilities, also the timing of the programme. It was agreed to book the Hotel in Pitlochry for 24-26 March 2014. DB would advise the community tomorrow.
GridPP PMB Minutes 506 (30.09.2013) =================================== Present: Dave Britton (Chair), Andrew Sansum, Roger Jones, Tony Cass, Jeremy Coles, Pete Gronbech, Dave Colling, Claire Devereux, Steve Lloyd (Minutes - Suzanne Scott) Apologies: Tony Doyle, Dave Kelsey, Pete Clarke 1. Planning for Oversight Committee (OC) ========================================= DB advised that working backwards from the date of 1st November for the OC meeting, it meant we would need to submit the documents on 25th October, final draft would therefore be required by 21st October, and the initial complete draft ready by 14th October, which was in two weeks' time. The work on this had to be divided up as follows: - Project Status document (DB) The last one was in 2011. An introduction was required including what has happened since 2011. - GridPP4 Status (PG) This would be on the Project Map, Metrics and the Risk Register, including pressure points (as requested by Tony Medland). - section on wLCG & EGI context (CD/DB) - one-page on the Tier-1 (AS) AS should work from what was done last time - it should be high-level only including success and concerns. - Deployment at the Tier-2 (JC/SL) This should be two pages of high-level overview. - User Reports (the platform to clients) (RJ/DC/PC) and for 'other' VOs (Chris Walker) This should be a paragraph from each, using material from GridPP31. - Impact (SL) This should be an update noting any recent activities. Tony Medland had asked to know what the contributions were from the various groups (Tiers?) around the country, presumably this would include the data and storage groups? RJ thought he meant the standard question re the Institutes? - Resource Report (PG) DB would review this after PG had produced a draft. TM asked for a separation between RAL and the Tier-2s, and also to separate capital from resource. PG would look at the old documents as a starting point. - GridPP5 A document would be needed about GridPP5 - this would include outlining the success of GridPP4 (including the GridPP4 structure) and could include our best guess at hardware costings using REBUS. It could also include an outline of the Tier-1 and Tier-2 cases. AS noted that we had the draft TDR from the experiments - could this be used as a basis for the document? DB agreed yes, it was a good place to start. RJ advised starting off with simple statements, assuming a flat cash scenario and outlining more diverse use of Tier-2 functionality. DB would speak to TM again regarding content and extent of the text submission. Should we also submit a document on the DRI funding and how this was used? JC thought it showed added-value. DB agreed yes, we could look at that and see if the document was worth including. DB advised that all documents were required asap, but certainly a good draft of all sections was required by 14th October, in two weeks' time. There would be a short PMB on 14th October (no PMB on 7th due to unavailability of members). October 21st was the joint NDGF meeting at RAL and this would be in lieu of a PMB. DB would get back in touch with everyone once he had spoken to TM. ACTION 506.1 ALL: to produce draft documents before 14th October, as described, for the GridPP5 proposal submission. 2. Nordic DataGrid Facility (NDGF) meeting =========================================== It was reported that there would be an NDGF meeting on 21st October at RAL. DB had just heard that they wanted to meet from 9.00 am to 3.00 pm. Could we arrange a car from Heathrow - there would be 4 visitors? AS noted yes. There could be a tour for the first hour which would give people time to arrive, the meeting could start at 10.00 am. AS/DB would discuss offline. The proposed Agenda comprised the following: - organisational overview - funding overview - technical overview & service delivery - software choices & directions/trajectory - networking It was noted that Robin Tasker had intended to be there but it seemed he would be on leave at the time. Could we ask him anyway? AS would. There would also be discussion on future collaborative work, directions, risk analysis etc. DB noted that the meeting would be quite high-level and it should be a two-sided exchange with two slots per topic within the timeframe of 10am-3pm. There would be time for discussion. It was not mandatory that all of the PMB attended - AS, JC, SL, and experiment reps if possible. AS would identify any people from the Tier-1 to attend. ACTIONS 506.2 DB would contact the NDGF group and agree the timings for the meeting. 506.3 AS would book a suitable room for the NDGF meeting and do a doodle poll, also arrange lunch. 3. Pledges =========== It was noted that the Pledges were due on October 11th. SL advised that he would need to figure out the Tier-2s, who didn't need to break-up with the Institute, he would check the Tier-2 were happy for the PMB to sign on their behalf. DB noted that he would extract the data from REBUS and work out what it should be, check this with SL and then SL would check with the Tier-2s that all was ok. ACTION 506.4 DB to extract the pledges data from REBUS and work out what it should be, check this with SL and then SL would check with the Tier-2s that all was ok. PG asked whether, for the Tier-1, we would meet experiment requirements from REBUS? Yes. DB advised that we had to be bound by the numbers in REBUS and this needs to be done within the next week. 4. GridPP31 ============ DB wished to record thanks to DC for pulling this meeting together - it had been very well run and very successful, also the discussion sessions had worked well throughout and community engagement was good. Comments? AS thought it had been a good meeting, there were no surprising inputs which indicated that we had a good understanding of where we were and what we needed to focus on next. STANDING ITEMS ============== SI-0 Monthly report from Development Group ------------------------------------------- DC reported that there had been a meeting on the Friday prior to the Collaboration Meeting - there was no new activity over the summer and everything was now re-starting. RJ advised that the report from the Cloud SIG meeting was reaching its final stage so he would circulate this to the PMB. SI-1 Dissemination Report -------------------------- - SL reported that DB/SL would be interviewing on Thursday for a replacement for Neasan O'Neill's post. - Alex Efimov had been in touch regarding the proposal re file-splitting and had asked some questions. We had decided that this should be done at Imperial. DC confirmed he had contacted Alex and asked him for his technical contact but had received no reply. - SL had written a letter of support for the QMUL people using the Grid. SI-2 ATLAS weekly review & plans --------------------------------- RJ noted not much to report - there had been blacklisting of data-transfer end-points in the UK over the weekend; they had discovered a few problems with CVMFS and issues with configuration, they were currently working on this. JC advised that there had been problems with pilot factories and FTS3 for ATLAS. RJ added that there was a known CVMFS bug which was not yet fixed, and that there were niggles in the system generally at the moment. SI-3 CMS weekly review & plans ------------------------------- DC reported a transfer problem at RAL and job failures, there was nothing else to report. The Tier-2 were all ok and above 80%; all other tests with RAL appeared to be fine. SI-4 LHCb weekly review & plans -------------------------------- PC was absent. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1. Our SL6 WN migration plans are recorded in https://twiki.cern.ch/twiki/bin/view/LCG/SL6DeploymentSites. A check just over a week ago (http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html) indicated that about half GridPP sites had migrated or setup test queues ahead of a migration. The transition deadline remains 31st October and currently all our sites aim to meet this date. 2. There have only been a few expressions of interest for attending the WLCG workshop (November 11th-12th) in Copenhagen. The agenda is focused on experiment plans and general e-Infrastructure moves (https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=251191). 3. Last week we certified 100percentIT under the UK NGI. This is the first commercial company to join the EGI federated cloud. 4. A potential outage of a switch at Manchester (2 weeks ago) highlighted that our VOMS backup solution is still not fully in production. Progress has been slower than anticipated due to various operations tools not supporting multiple VOs setup across multiple VOMS and the need to tightly control the migration (so avoided during the summer) - CEs/SEs have to support the new configuration first and then UIs and finally the ops portal and users need to be informed. The transition is now being pushed with status updates appearing at https://www.gridpp.ac.uk/wiki/Adoption_of_Backup_GridPP_Voms_Servers. 5. ATLAS pilot factory and also FTS3 issues led to a drop in ATLAS work across the UK cloud last week. 6. The delayed deployment of GOCDB v5 is now scheduled for release on the morning of Wednesday 2nd October. This is a major release. 7. The DPM workshop mentioned at a PMB a few weeks ago is now scheduled to take place in Edinburgh on 13th December. SI-6 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric ------ 1) Disk and CPU tenders were scheduled to close last Friday (27 September). We have not yet seen the responses. 2) T10KD tape drives are now available. We have a quote consistent with our spend plan and will place an order shortly. 3) RAL will move its main site connection to JANET 6 on Tuesday 1st October. We will also move the backup OPN link that day. 4) On Tuesday 8th October we will move the primary OPN link. We expect both network changes to be almost transparent. 5) Following a problem with an LHCb disk server where data recovery was difficult we will be reviewing the operational status of the SL10 disk server generation. This is a precautionary review and doesn?t necessarily imply an urgent drive to phase out, however should we conclude it is necessary then it should be noted that the SL10 batch still features in the 2014 capacity plan. Service ------- 1) Reports covering last 2 weeks available at: http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2013-09-11 http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2013-09-18 2) CASTOR a) Occasional low level rate of transfer failures from CASTOR. Almost no SAM test failures in the last 10 days but underlying CGSI-GSoap errors remain at 1% level. Still hunting the problem ? CERN see it too. 3) Batch farm a) The HTCondor farm is now at 50% of capacity. It will be marked as production in the GOC today (Monday 30 September). b) The remaining Torque/Maui farm (50% capacity) will be upgraded to SL6 on Thursday 3rd October. The farm will commence draining on Wednesday (announcements will be made today). 4) FTS3 testing continues. Performance problems last week impacted the ATLAS cloud but since a further update to FTS service has been good. 5) Further Top level BDII problems occurred on 11-12th September. These were resolved by applying a workaround provided by the BDII team. Good stability since but now chasing a further issue in the Glue publishing. 6) Unexpectedly CMS informed us last week that the xrootd fall-back test would become a critical test on the 1st October. At present xrootd fall-back is disabled at RAL (Tier-1 and Tier-2) owing to the impact it caused on the site firewall. We are considering how best to respond. Recruitment ----------- 1) We expect a new starter (YiNi) to start work in the Fabric team tomorrow to work on hardware repair in the first instance. SI-7 LCG Management Board Report --------------------------------- There was no report; the next meeting had been cancelled. AOB === PG reminded everyone that the Quarterly Reports were now due. Could these be submitted in a timely fashion please! REVIEW OF ACTIONS ================= 496.2 PC to update the network forward-look. Ongoing. 500.3 AS to send details to DB regarding the RAL Tier-1 kit available for retirement, to enable DB to write to institutes, following which we would decide how to proceed. Ongoing. 500.4 CD to do a cost/benefit analysis for the services GridPP currently provides (in the context of a possible bid to continue some services post-EGI). Ongoing. 502.4 DC to provide a paragraph of text to PG regarding the 2012 year report: experiment reporting milestone. Done, item closed. 503.3 AS to thank Pete Oliver for agreeing to help re the HAG Chair (but we needed someone accountable to the PMB for the role). Done, item closed. 503.4 DB to ask DK if he would Chair the HAG. Done, item closed. 504.1 RJ/DC to contact DELL and establish facts about special pricing and the portal availability. Ongoing. ACTIONS AS AT 30/09/2013 ======================== 496.2 PC to update the network forward-look. 500.3 AS to send details to DB regarding the RAL Tier-1 kit available for retirement, to enable DB to write to institutes, following which we would decide how to proceed. 500.4 CD to do a cost/benefit analysis for the services GridPP currently provides (in the context of a possible bid to continue some services post-EGI). 504.1 RJ/DC to contact DELL and establish facts about special pricing and the portal availability. 506.1 ALL: to produce draft documents before 14th October, as described, for the GridPP5 proposal submission. 506.2 DB would contact the NDGF group and agree the timings for the meeting on 21st October. 506.3 AS would book a suitable room for the NDGF meeting and do a doodle poll, also arrange lunch. 506.4 DB/SL: DB to extract the pledges data from REBUS and work out what it should be, check this with SL and then SL would check with the Tier-2s that all was ok. Next PMBs: ========== - NO PMB on 7th October - short PMB on 14th October to consider draft OC documents - NDGF meeting 21st October - NO PMB

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager