JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  March 2010

UKHEPGRID March 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

GridPP4 Registration Deadline and Minutes of the 381st GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Wed, 17 Mar 2010 14:55:09 +0000

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (51 lines) , 100315.txt (352 lines)

Dear All,

Registration for GridPP4 (http://www.gridpp.ac.uk/gridpp24/ ) closes a
week today.

Please find attached the GridPP Project Management Board
Meeting minutes for the 381st meeting. The latest minutes can
be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
Dept of Physics and Astronomy                Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________



























GridPP PMB Minutes 381 (15.03.10) ================================= Present: David Britton (Chair), Steve Lloyd, Sarah Pearce, Andrew Sansum, Tony Doyle, Dave Colling, Robin Middleton, Pete Clarke, Roger Jones, , Tony Cass, Jeremy Coles, Glenn Patrick, Neil Geddes (Suzanne Scott, Minutes) Apologies: David Kelsey, John Gordon 1. Week's Notes ================ a) Tier-2 Investments --------------------- SL reported that he had received inputs from various people, which he is currently working on. Inputs related to construction and building costs etc. A few more had been promised but he could make a start on the paper and figures now. SL had received unfunded effort info from SP and also has some electricity estimates. b) Experiment speakers for RHUL ------------------------------- RJ and DC reported that either themselves, or someone else, would speak at RHUL. Both would advise DB by Wednesday if possible. c) OPN link status ------------------- AS reported that he had circulated info on the OPN, and was still in discussion with Robin Tasker. DB advised that he needed the info within the next 2 weeks. DB noted that if ATLAS were going to use 9 Gb/s then we needd the backup link for load balancing. AS noted that we could decide what we provide. d) CERN Hardware paper ----------------------- Email information had been circulated and there had been a subsequent email discussion exchange. Bernd Panzer had produced a CERN summary paper on hardware costings. Price comparisons had been made. Iterations had taken place re available capacity. AS reported that although CERN prices appeared higher by 60% - they were using a RAID 1 system and two hot spares plus two system disks. Our (RAL) configuration was different, therefore there were different overheads. Out of 24 drives, CERN had 10 data drives, RAL had 20. AS advised that trying to pin-down the remaining differences was very difficult. In the final analysis AS thought that we were within about 5% for the hardware cost. DB noted that the paper which AS was preparing would be used to support hardware costings for GridPP4, if required. 2. EGI/NGI paper ================= RM had circulated a spreadsheet. NG was still working on governance area text. RM reported that the spreadsheet had three worksheets giving detail. The first worksheet gave columns for a likely hopeful outcome scenario, compared with a minimum UK NGI requirement and a desirable requirement. DB noted that the idea was to make some slides for the PPRP to address the three scenarios (default; no-EGI; and no-NGS) including risks and opportunities. The second worksheet related to no EGI - the GridPP contingency of 1 FTE had been added. The final worksheet related to no NGS4 at all. It was noted that JISC had committed to EGI so it was unlikely that no funding would be forthcoming. JISC had signed up to the legal entity EGI.eu statute. RM noted that it was difficult to make choices at the moment until the situation was firmed-up. GridPP could become itself an NGI or it could simply relate to wLCG and there would be no NGI at all. PC noted he was nervous that we would take on something that JISC wouldn't fund, as it wouldn't help us, or help to get particle physics out of CERN. What difference did it actually make being at the 'top table'? NG commented that it would depend on wLCG commitment on achieving goals through EGI. DB thought it was a longer-term question really. PC noted that for GridPP to carry the responsibility for the UK when the UK were not interested, would be a difficult task. NG advised that we still have to deliver certain elements of an NGI - we couldn't deliver the whole of the NGI requirement, only the bits that we were concerned with. DB thought that it was a grey area - if we do the things we need, who do we deliver them to? We would still need to do security/accounting/regional support - we would be doing these anyway. DB advised that in order that we could answer the PPRP, various scenarios needed to be considered. It was agreed that RM should embed the numbers into the document and get input from NG. DB would use the document as a basis for backup slides for the PPRP. DB noted that this document should be PMB-internal. 3. Travel ========== a) RM reported that he had received requests to go to Hepix in Europe (Lisbon), but the cost per person was 1500, which equated to 200+ per day overall rate. How much was GridPP prepared to fund? He had received 2 requests so far. It was proposed to fund 1 person per Tier-2, but this was expensive. TC advised that CERN funded 220CHF per day. RJ thought we should specify a lower rate, DB agreed noting actuals with a limit on the hotel. It was decided to take the CERN rate and divide by ~1.6. It was agreed that a maximum daily rate of 130 should apply. b) RM reported that the issue would be the same for CHEP, which was taking place in Taiwan (Taipei). RM advised that 15-20 had gone in previous years. DB advised that only those giving talks or organising sessions should go. There would be a bidding process and GridPP would only pay 50% of costs. SP advised that Neasan O'Neil was waiting for an announcement re the stands. CHEP was most useful re the stands, as we got a lot of attention. RM suggested funding only Neasan at full cost Any others doing papers and manning the stand would be part-funded. It was agreed that as a requirement of funding, delegates worked on the stand. DB also asked if we could ensure that it was not the same talk being given by the same people all the time. The same talk didn't warrant repeated funding. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) FY09 procurements: - All disk and CPU has been delivered. - We expect to be able to start acceptance tests on one lot of disk and CPU today. 2) FY10 procurements - We have started the process of updating the procurement documentation for FY10 procurements. 3) We have agreed the change request to move CMS to T10KB drives and are working on implementation. Initial testing is underway. 4) We have placed an order for a second C300 core network switch (not funded by GRIDPP). This is to act initially as a cold standby switch in the event of a major failure of the main core network switch. Eventually the second switch may be deployed in parallel with the existing switch to offer greater operational resiliance.   Service: 1) SAM test availability for the ops VO was 100%. 2) We are working on an upgrade strategy for CASTOR from 2.1.7 to 2.1.8 or 2.1.9 we expect to discuss with the UK VO representatives next week then discuss at the PMB. 3) We are starting the LHCb drain of problematic RAID 5 disk servers as agreed with LHCb. The aggressive draining led to failed SAM tests (ops VO) however LHCB VO tests remained OK owing to the longer timeout and LHCB were satisfied with the process agreed. 4) LHCB 3D database streaming had problems last week [probably fixed now but no authorative update available this morning] 5) FTS will be upgraded to version 2.2.3 on Wednesday in order to meet WLCG baseline versions and provide checksumming    functionality. SI-2 ATLAS weekly review & plans --------------------------------- RJ reported that there had been an issue at RAL yesterday. Cambridge had a broken install which was now fixed - production load and real data were expected. Re the use of Cream and SCAS, ATLAS have done testing with Cream and had encountered problems in relation to tokens - Cream was not much use to them until it was fixed. RJ advised that ATLAS policy was that GDB were pushing Cream, not ATLAS, and ATLAS were not happy to have it deployed at present. Re glexec, the concerns were coming from Security, not ATLAS. RJ confirmed there was no push from ATLAS about either of these, certainly not until they were fixed and problems had been ironed out. DB commented that Graeme Stewart was managing this at present but operationally he agreed it wasn't good. DB asked JC if it was deployed and testable? JC noted yes, at 3 sites. The GDB and the MB were pushing this to ensure testing. DB advised that the MB had agreed to suspend security policy until April 1st - the exemption could be extended or there would be a move to using glexec to satisfy Security policy - were we ready for either of these scenarios? JC advised no. DB observed this could be messy. JC noted it was deployed at a number of sites but not all, and bugs remain. DB asked JC to keep abreast of this, especially as he attends the GDB. JC confirmed that what has been deployed up until now has not been tested. DB noted that whilst not all sites want to do this, they should not be surprised at a short timescale request to move. There has been plenty of warning. DB noted that if three sites were deploying SCAS and glexec, had CMS used them? These were deployed at Glasgow, Lancaster, Oxford and Manchester. DC reported that CMS occasionally use Oxford and Manchester - he would check and see if they've been used. He confirmed that CMS would use a site if SCAS and glexec were installed there. SI-3 CMS weekly review & plans ------------------------------- DC reported that there wasn't much happening, they were ticking along, starting Monte Carlo production. There had been a problem at RAL PPD but all in all they weren't in bad shape. SI-4 LHCb weekly review & plans -------------------------------- GP reported as follows: 1. Problem with "resolv.conf" on a T1 diskserver preventing access to data on the diskserver by user jobs and interactive use. Fixed by Shaun on 8 March. 2. Various problems over the week with jobs failing to access data at RAL. The data was on RAID5 servers on lhcbDst space token which was already being drained by Brian. To finish the draining in a reasonable time, RAL-DST (lhcbDst) was banned within LHCb on Thursday night and intensive drains were started on Friday morning. The LHCb lhcbDst RAID 5 diskservers were all drained on Saturday and the space token has been put back in production today. 3. Problem with LFC at RAL. New record created at RAL on 2 March on the LHCb LFC - should not have happened as RAL was read-only. The Read-Only user in the Oracle dB had been created with write permissiions (fixed now). The user was setting up Nagios tests based at Oxford and it is not clear why there should have been a request to create a new record sent to RAL. Investigations ongoing. 4. Problem with uploading data out of some UK tier-2 sites ongoing. It is usually a very small problem on most sites, but Glasgow is particularly hit by it and has been banned within LHCb. Other sites are within the LHCb mask and accepting jobs. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) Registration for the storage workshop being run with GridPP24 was now open: http://www.gridpp.ac.uk/gridpp24/StorageWorkshopRegistration.html . The funding was agreed at 15 places and was being allocated on a first-come first-served basis. 2) There is going to be a joint NGS-GridPP operations meeting in April to better understand the functions of, and directions required for, NGI operations work. Please let me know any particular points the PMB wish considered. 3) Some concerns have arisen with ATLAS users (at Liverpool) being able to submit an (arbitrary number of) Ganga jobs (direct and avoiding the pilot system) with data access performed by rfio which very quickly degrades the performance of DPM. It may be that the queue being used can be disabled but if not this raises a few questions as ATLAS suggest the site should optimise rfio and the site points out that there is no one good optimisation for all cases. 4) The EGEE league table for February has been published: https://edms.cern.ch/document/963325 . Three GridPP sites are mentioned as not hitting the availability/reliability targets. UKI-LT2-UCL-CENTRAL Scheduled donwtime during 10 days due to 'lfsck'ing Lustre' UKI-SOUTHGRID-BRIS-HEP Scheduled downtime during the whole month due to DPM retirement & lcgce04's WN configuration UKI-SOUTHGRID-RALPP Unsched downtime due to problems with air conditioning. We are currently looking into the Bristol case as although some components were in downtime the site remained fully functional and ran jobs. 5) The status for the CREAM CE remains as last week. No further news on releases for SGE or Condor has appeared. Both ATLAS and CMS have reported problems using CREAM. Additional: (A) There is a GDB in Amsterdam next Wednesday 24th March: http://indico.cern.ch/conferenceDisplay.py?confId=84636 A discussion of the pilot jobs status is expected to take place. (B) Regional Nagios monitoring had been using dteam but has now returned to ops. SI-7 Dissemination Report -------------------------- SP reported that Neasan was drafting a press release re the collisions event at CERN. He will probably ask for a quote from us (with his ATLAS hat on). There was nothing new to report on the R89 opening. A news item on KE was being drafted. GridTalk had been approved for funding, QMUL and Imperial were involved. AOB === NG noted that due to CERN rearranging 'first beam' for March 30th, there might be fewer people at the Tier1/R89 Opening at RAL. Invitations were extended to the PMB. It was noted that unfortunately we would be at RHUL at that time. NG advised that we could encourage others to attend the NGS event, as spaces would be available. REVIEW OF ACTIONS ================= 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. JC to progress. Ongoing.   366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. Ongoing. 379.1 Re GridPP4 proposal and forthcoming PPRP meeting: SP to begin work on 'background' financial planning. Done, item closed. 379.2 Re GridPP4 proposal and forthcoming PPRP meeting: AS to look at the CERN hardware paper and work on the CPU and disk costings. Done, item closed. 379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information to the WBS. Ongoing. 379.5 RM/SP to assimilate the information in DB's paper on NGI within the GridPP4 Proposal, and circulate a new updated paper before next week's PMB. This would be a transition document addressing the possibility that: 1. There would be no NGI; 2. There would be no future funding for NGS. Done, item closed. 379.7 JC to follow-up the issue of merging VO lists and ILDG VO. Done, item closed. 380.1 SL to circulate an Agenda for the Deployment Board meeting at RHUL. Ongoing. 380.2 ALL: to send SL information on infrastructure investments at their respective institutes. Done, item closed. 380.3 AS to send SL assumptions re electricity (in relation to investments in infrastructure). Done, item closed. 380.4 SP to send SL historical numbers on unfunded effort (in relation to investments in infrastructure). Done, item closed. 380.5 RM/SP to make changes to the EGI/NGI paper as discussed and bring back a revised version to next week's PMB. Ongoing. 380.6 ALL: to feedback comments on the EGI/NGI paper to DB, RM or SP before next week's PMB. Done, item closed. 380.7 Re the OPN backup link: AS to find out: 1. When the link is supposed to be operational; 2. More detail about how and when the link will be tested. If possible AS should delay Invoice payment until more information was forthcoming. Ongoing. 380.8 RJ/DC to advise us of what the experiment plans are in the UK in relation to SCAS and glexec. Done, item closed. 380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info will be needed after the PPRP. Ongoing. ACTIONS AS AT 15.03.10 ====================== 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. JC to progress.   366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. 379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information to the WBS. 380.1 SL to circulate an Agenda for the Deployment Board meeting at RHUL. 380.5 RM/SP to make changes to the EGI/NGI paper as discussed and bring back a revised version to next week's PMB. 380.7 Re the OPN backup link: AS to find out: 1. When the link is supposed to be operational; 2. More detail about how and when the link will be tested. If possible AS should delay Invoice payment until more information was forthcoming. 380.9 RJ/DC to send info to DB regarding resource estimates for the upcoming period, as this info will be needed after the PPRP. INACTIVE CATEGORY ================= 359.4 JC to follow up dTeam actions from the DB, as follows: --------------------------- 05.02 dTeam to try and sort out CPU shares and priority resources, at Glasgow first (perhaps by raising the job priority in Panda). --------------------------- JC would check the situation with Graeme Stewart (who was currently on annual leave). JC followed up with Graeme and the other experiments. A test was started but this area has been deemed low priority and further progress is not expected for some time. ATLAS see no issues with contention. LHCb are not intending to pursue anything in this area. A CMS discussion has started but again it does not appear to be urgent. If the experiments are not pushing this internally then there is nothing for the deployment team to follow up! It was noted there was no priority in ATLAS at present, this will be pending for a while. Move to inactive as it is a long-term action. --------------------- The next PMB would take place on Monday 22nd March at 12:55 pm.  

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager