JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  March 2012

UKHEPGRID March 2012

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Registration Closing Today and Minutes of the 453rd GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Mon, 12 Mar 2012 12:04:13 +0000

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (41 lines) , 120227.txt (681 lines)

Dear All,

_-_-_-_-_-_-_-_-_-_-_-_-_-REMINDER_-_-_-_-_-_-_-_-_-_-_-_-_-_

Registration for GridPP28 at http://www.gridpp.ac.uk/gridpp28/
closes today.
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Please find attached the GridPP Project Management Board
Meeting minutes for the 453rd meeting.

          The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.






















GridPP PMB Minutes 453 (27.02.2012) =================================== Present: Dave Britton (Chair), Jeremy Coles, Pete Gronbech, Steve Lloyd, Pete Clarke, Tony Cass, Robin Middleton, Andrew Sansum (Suzanne Scott - Minutes) Apologies: Tony Doyle, Roger Jones, Dave Kelsey, John Gordon, Glenn Patrick, Dave Colling, Neil Geddes 1. Summary of Quarterly Report Issues ====================================== PG reported that all of the Reports had been received except for CMS. DC was waiting on information from RJ. Red Metrics were as follows: - the Tier-1 staffing situation was still an issue during Q4, but 4 new staff started recently and it was expected that this metric would move to Amber for 12Q1. - the Tier-1 ranking for ATLAS was red. AS advised that the current round of upgrades should make a difference. PG noted that disk procurement was underway, modulo the floods and the reduced cost. The DRI funding had helped. - for ATLAS there was one red metric: data availability went from 98 to 92%, the reason wasn't known yet. AS advised this was probably due to CASTOR issues and other minor things. There had been Oracle database problems. PG advised that the AHM paper would suffice for the milestone report. LHCb was pretty good, there had been a couple of amber 'below target', but the target may need to be modified due to new practices. DB advised that we needed to re-visit the Risk Register and that this should be put on the Agenda for the F2F meeting. ACTION 453.1 PG to add the Risk Register to the Agenda for the upcoming F2F meeting at Manchester. PG continued - for 'Other Experiments' the report was not too bad, there were a couple of amber metrics which showed a drop to 63.3% (below the target of 75%). AS reported that job efficiency overall was excellent, it was a small volume of data only, in context. DB noted that he didn't want to drop the target - small VOs should try to increase their efficiency. ACTION 453.2 AS to get the issue of small VO efficiency, which should be increased, onto the Agenda at the experiment liaison meeting. DB noted that we needed to examine Tier-2 disk usage by non-LHC VOs, and return to this issue later this year. PG to retain this issue on his list. AS noted that there had been a lot of demand from LHC VOs which meant that spare capacity was down. PG continued - re Deployment and Ops, there were some amber metrics but most were close to target. DB noted that the Durham issue remained - he queried why Edinburgh and Lancaster didn't use such a large amount? JC advised that 2/3 of the Lancaster numbers were missing in the accounting. DB thought that the problem was now resolved. PG noted that the calculation had come from the ScotGrid report. For accounting we used the lower figure. JC would check whether the number had been propagated through the accounting. ACTION 453.3 JC to check the ScotGrid quarterly report to see whether or not the incorrect number had propagated through the accounting system. PG continued - all sites now had Cream CE installed, however overall reliability and availability did fall. Oxford were supporting Alice now. The UK CA had caused some problems. For Data Group there was one red metric: the blog posts were low; one milestone was overdue; Argus deployment was pending. For Security there had been no incidents, but there was a red milestone - the security framework from EGI was still under development. For NGI work was continuing on APEL and the GOCDB; Durham had been marked red by EGI. For Execution, manpower was low. PG queried the 'Year 1 review of service to the experiments'. AS noted we were keeping a note of this annually - both points of view were required: what the experiments received; and what we provided. The questionnaire was no good and the issue required serious attention - a 5-minute response was not good. PG noted the metric had been there originally but he wasn't sure of origin. DB asked PG to give his conclusions and recommendations after speaking to sources. DB noted we could discuss this at the F2F. PG would provide a bullet list for the PMB to address during the meeting. DB noted that CPUs and disk provided were already covered by the metrics, we didn't need to ask again - we needed to identify the high-level problems. PG advised that this could be summarised from issues arising during the year, and could be delivered via a couple of slides at the meeting, in order to meet the milestone. ACTION 453.4 PG to provide at the F2F in Manchester, a bulleted list of summarised issues which had arisen during the year and were noted in the Quarterly Reports. This would meet the milestone required. PG continued - regarding Outreach, the website was failing to meet the target; there had been no KE meeting and no press releases. DB advised that the onus was on us to help Neasan O'Neill to meet the targets - we should give him more of a platform at GridPP29. We needed to help Neasan reach his objectives. DB noted that it was no problem in this category to have aspirations even if they were noted as amber or red - we had been under-funded on Dissemination by STFC. 2. Tier-2 Disk =============== DB advised that the issue of accounting policy discourages Tier-2 sites to allocate disk to non-LHC VOs. We need to adjust the accounting algorithm soon to rectify this. SL noted we needed to do the count first. PG advised that he had already asked the Tier-2 Co-ordinators to provide that information. DB asked how we weight the algorithm? If we want them to deploy 3% then we need to weight it the same whether to T2K or to ATLAS. PG noted then the non-LHC VOs could be added in to the ATLAS sites. SL would think about this. DB noted it would be good to resolve this before Manchester. ACTION 453.5 SL to help resolve the issue of weighting for non-LHC VOs at the Tier-2s. 3. AOCB ======== - GridPP29 DB had circulated suggestions. Possibly Oxford next time? PG had spoken to Sue Geddes and there was a lecture theatre available. The date could be 10-12 September 2012. Were there any clashes? The AHM was 10-12 September. ATLAS week was the first week of October. LHCb week was 3-7 September. DB thought that the end of that week then, 13 and 14 September might be possible. Could PG check those dates instead? We would have the PMB on 12th. [Note Added: GridPP29 dates now converging on week of 24th Sep 2012] ACTION 453.6 PG to check lecture theatre availability for week of 24th September for GridPP29. - EMI-2 early adopters JC would meet with Daniela Bauer and Duncan Rand this week. Brunel had volunteered. - FP7 Data Preservation Project DB noted that DC/PC/RJ were all interested in this. This was CSA therefore matching funding was not required. Should we get involved? PC had mixed feelings, as the deadline was very close. It was peripheral to taking the data and analysing - it was on the borderline as to whether we got involved. Wearing a UK hat however, PC thought that the work had to be done. DB agreed on the money side, but it was a lot of work for not much return, however it was an area in which it was better if we were involved. DC and RJ were too busy, who could express interest? DB would express some interest and lay out the constraints. ACTION 453.7 DB to 'express interest' in the FP7 Data Preservation Project and would contact Jamie to check the scope and what was required. PC, RJ and DC were interested on behalf of the experiments. - ATLAS UK tutorial RM advised that the cost of ~£2k for travel for this seemed fine. DB proposed PMB support. This was agreed, subject to further detailed information from RJ. STANDING ITEMS ============== SI-1 Dissemination Report -------------------------- SL reported on behalf of Neasan O'Neill: 1) The website was almost done, NO was working on getting it all live, some of the updates could be seen already like the "Docs" page (http://www.gridpp.ac.uk/docs/) 2) NO was in Taipei this week, so would mostly be out of contact 3) Masterclasses had been confirmed: Daresbury, Oxford, UCL and QMUL 4) There would be no UK NGI/GridPP/NGS stand at Munich as NGS could not confirm funds. Once they had confirmed, there were no booths left, but we were on the waiting list 5) NO was attending a meeting in Glasgow on the 10th of March about TuringFest and GridPP's involvement - they wanted to do an entire session on the Higgs. Jamie Colman had contacted Mark Mitchell, who noted to him that we should involve Neasan O'Neill. We should identify some Grid speakers. It was a good event with a technical audience. SI-2 ATLAS weekly review & plans --------------------------------- RJ was absent. SI-3 CMS weekly review & plans ------------------------------- DC was absent. SI-4 LHCb weekly review & plans --------------------------------- GP was absent. SI-5 User Co-ordination issues ------------------------------- GP was absent. SI-6 Production Manager's Report --------------------------------- JC reported as follows: 1) We received a COD escalation last week for a site reportedly exceeding a month in downtime. Upon investigation it was found that the RAL node was not marked in-production and was in a testing state for ATLAS. The issue now requires follow-up by the dashboard developers – it is not possible to issue tickets to sites that are not in production so those sites need to have a special status in the dashboard. 2) A few sites are beginning to look at perfsonar installs. GridMon nodes are also arriving but for now the recommendation is to install the node but leave it off – pending configuration details and work on the policy. 3) Manchester is facing a period of poor availability as reported by the ops tests due to a problem (reported back to WLCG and the DPM developers on several previous occasions) with the SE put tests that are marked critical but there is sufficient space for the files. The problem occurs because a DPM bug means that when their “other VOs disk server” is down or marked read-only other disk is not counted. The site view: “… there is more than enough disk space available to cover those 22TB that are down due to a dodgy raid card that needs replacing. I'd like to underline that that is one of the small file systems not even one of the big ones.   The system doesn't see that because the space is reserved by space tokens and anything is subtracted by the common space first and then in a weird way from the space tokens.” Several sites workaround this monitoring problem by creating a pool specifically for ops/sgmops or allowing ops to write into other reserved areas. The argument against setting up separate pools is that ops is then not testing the most important filesystem(s). Similarly sites set up CPU queues for ops to ensure the tests can run. Some sysadmins argue that these tests then create an environment that is not optimal for “real” work in order to return good results for operations grid metrics. I raise this (once again) here so that the PMB is fully aware of the situation as relates to use of the ops VO for testing. The tests are useful and spot problems though in some ways are clearly not ideal. For the SE tests there may be some scope to adjust what is critical and this can be fed back to EGI. In the meantime Manchester has responded that they “can remove space from atlas so that these tests can run if the PMB doesn't remove them from the accounting”. The status of Manchester is about to trigger a non-performance ticket to the COD as the ROD ticket cannot be extended beyond 30 days. DB advised that we should not make a different policy for Manchester that was different to other sites. JC noted that the situation would need to escalate to a ROD ticket. DB noted that if other sites had found appropriate workarounds then Manchester should do likewise. 4) The matter of disk space for other VOs (3%) is being checked but the PMB needs to consider how making this disk available is wrapped into the T2 metrics. As Steve (Lloyd) has pointed out, at the moment “it's better for sites to have empty ATLAS disk then full T2K etc.” SI-7 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric:    1) FY11 procurements    - D12 CPU delivery expected to deploy to production in next 24-48 hours    - V12 CPU delivery under RAL proving test. Expect to deploy around 9th March    - CV12 Disk some minor problems, vendor acceptance. expect to complete acceptance tests 9th March    - V12 disk expect to complete acceptance tests 9th March 2) Power failure on 14th February external to RAL led to 3 racks of disk servers losing power. This couldn't have happened at a better time as service was in scheduled downtime for CASTOR nameserver upgrade. 3 racks of disk servers lost power. 3) Work on essential supply board on UPS supply on Tuesday and Thursday this week. Increased risk of loss of power or cooling to UPS room equipment. 4) Hardware intervention was successful on the core C300 switch on 8th February Service: Many interventions scheduled - most transparent or with minimal disruption but a number of major interventions planned during February two of which (CASTOR and Batch Farm) carry high risk. ATLAS had a very difficult time in January owing to poor SRM availability. 1) Summary of operational issues and scheduled interventions is at:     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2012-02-15     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2012-02-22 2) CASTOR a) Upgrade to CASTOR 2.1.11-8 completed successfully for ATLAS and CMS (last week) LHCB and Gen this week. b) Ongoing SRM problems after SRM upgrade have required aggressive re-starter cron. Work to identify the underlying cause will commence once we have completed CASTOR upgrades this week. c) We expect to move the CASTOR database servers to their final hardware configuration in 1-2 weeks' time. This will require a 1-2 hour outage on all instances.   3) The upgrade to the batch server was completed. Mainly successful, however publishing problems led to LHCB being unable to submit work in the later part of last week. 4) Problems with the CREAM CE (Zombie jobs) prevented LHCB from using the WMS for job submission. We have a workaround wich will go into operation later today or tomorrow to delete zombie jobs, but do not yet have a solution. Also seen at LAL.      5) The move of the ATLAS LFC to CERN took place last week. The RAL LFC is no longer critical for the ATLAS UK cloud. 6) We expect to schedule an upgrade to the FTS to EMI FTS 2.2.8 w/b 5th March. 7) The MYPROXY upgrade on 9th February was reverted after problems were encountered. Now suspected to be a hardware problem on the new target hardware rather than a problem with the upgrade. 8) The CIP (CASTOR information provider) upgrade on the 9th February was reverted after problems were encountered. Currently CIP is providing inaccurate disk capacity data. We are reviewing how the Tier-1 supports and maintains the CIP. Staff: 1) Grid team leader post. Ian Collier will lead the team. We will backfill Ian's post by recruiting a new system admin for the Fabric team. 2) Recruitments * Database post - recruitment post offered and verbally accepted. SI-8 LCG Management Board Report --------------------------------- There had been no MB. AOB === Re the DRI situation, PG to send an email to all PIs, reminding them that there was 5 weeks left on the DRI spend. ACTION 453.8 PG to send an email to all PIs, reminding them that there was 5 weeks left on the DRI spend. REVIEW OF ACTIONS ================= 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. Ongoing. 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy: RJ/PC to keep abreast of Policy and inform GridPP as this develops. 449.1 AS to document the recent network incidents at RAL. Ongoing. 450.1 DC to send the CMS spreadsheet accounting numbers to December, to SL. 450.2 Re SL6, JC to come back to the PMB with regard to plans & schedules. Ongoing. 451.1 DB to respond to Gillian re the EGI Community Forum, noting GridPP's willingness to lend support and to be involved, and to have a presence on the Organising Committee. DB to see if we could co-locate a GridPP or NGI event. Done, item closed. 451.2 JG to respond to Tiziana Ferrari re the RC Forum and note that GridPP would like to be involved. JG to consider how we contribute and report-back. Done, item closed. 451.3 PG/JC to look at non-LHC VO storage use at the Tier-2s and report back. Done, item closed. ACTIONS AS OF 27.02.12 ====================== 436.12 DB to produce a financial proposal for adjustments to the Tier-2 staffing profile over the term of GRIDPP4. 438.8 TC to advise when it is a good time to move to vidyo - early adopters were possible. 438.9 AS to contact relevant site managers to ask whether or not they would be interested in having retired Tier-1 hardware - if a site were interested then they should submit a proposal as to what they want and why. 448.4 ALL to send thoughts/suggestions to DB regarding the replacement for GP in the User Co- ordinator position (not necessarily based at RAL). 448.7 RJ/PC to draw-up GridPP guidelines in relation to a Data Management Policy: RJ/PC to keep abreast of Policy and inform GridPP as this develops. 449.1 AS to document the recent network incidents at RAL. 450.1 DC to send the CMS spreadsheet accounting numbers to December, to SL. 450.2 Re SL6, JC to come back to the PMB with regard to plans & schedules. 453.1 PG to add the Risk Register to the Agenda for the upcoming F2F meeting at Manchester. 453.2 AS to get the issue of small VO efficiency, which should be increased, onto the Agenda at the experiment liaison meeting. 453.3 JC to check the ScotGrid quarterly report to see whether or not the incorrect number had propagated through the accounting system. 453.4 PG to provide at the F2F in Manchester, a bulleted list of summarised issues which had arisen during the year and were noted in the Quarterly Reports. This would meet the milestone required. 453.5 SL to help resolve the issue of weighting for non-LHC VOs at the Tier-2s. 453.6 PG to check lecture theatre availability for week of 24th September for GridPP29. 453.7 DB to 'express interest' in the FP7 Data Preservation Project and would contact Jamie to check the scope and what was required. PC, RJ and DC were interested on behalf of the experiments. 453.8 PG to send an email to all PIs, reminding them that there was 5 weeks left on the DRI spend. The next meeting would take place on Monday 5 March at 12:55 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager