JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  January 2008

UKHEPGRID January 2008

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 287th GridPP PMB meeting

From:

Tony Doyle <[log in to unmask]>

Reply-To:

Tony Doyle <[log in to unmask]>

Date:

Wed, 16 Jan 2008 14:31:25 +0000

Content-Type:

MULTIPART/MIXED

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (20 lines) , 080114.txt (1 lines)

Dear All,

     Please find attached the latest weekly GridPP Project Management 
Board Meeting minutes. The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Tony
________________________________________________________________________
Prof. A T Doyle, FInstP FRSE                       GridPP Project Leader
Rm 478, Kelvin Building                      Telephone: +44-141-330 5899
Dept of Physics and Astronomy                  Telefax: +44-141-330 5881
University of Glasgow                   EMail: [log in to unmask]
G12 8QQ, UK                 Web: http://ppewww.physics.gla.ac.uk/~doyle/
________________________________________________________________________


GridPP PMB Minutes 287 - 14th January 2008 ========================================== Present: Tony Doyle, Sarah Pearce, Roger Jones, David Britton, Steve Lloyd, Robin Middleton, John Gordon, Jeremy Coles, Peter Clarke, Glenn Patrick, Andrew Sansum, Dave Colling, Suzanne Scott (Minutes) Apologies: Stephen Burke, David Kelsey, Tony Cass, Neil Geddes 1. ALICE priority ================== AS reported that at present Alice have zero disk allocation and have not yet had their CASTOR disk space set up. In order to take part in February's CCRC they (Alice central rather than UK) have requested 1.1 TB but it was likely that the requirement would be at least 1 disk server, which implies 4-6TB depending on exactly what can be made free. When we get the disk space we have to install the xrootd interfaces. It is probably not much work to install xrootd but if it gives any problems it will be in competition with higher priority work for ATLAS/CMS and LHCb in prep for CCRC08. Setting up the Alice CASTOR endpoints (on our shared server) is less than half-a-day's effort. It was noted that if this work does not start early next week there will be no chance of getting Alice ready for CCRC08. Even if the effort is invested next week, the chances of success are not great given the untried interfaces (at RAL), lack of priority, and time to resolve problems. On Tuesday the MB will require to know how we stand WRT the endpoint setup for all 4 experiments. How does the PMB wish to proceed? The disk space issue was discussed before but now our position of zero allocation will become very clear to the WLCG and is inconsistent with our MoU commitments. In the event that they want us to proceed, how should we prioritise Alice WRT the other LHC experiments and even Minos and Babar over the next 6 weeks or so? GP noted that the problem was lack of input from Alice, and the fact that their disk allocation had been used elsewhere due to lack of uptake and lack of engagement. GP had been given a technical contact at CERN but the Alice request (which GP had estimated) had not been confirmed. GP advised that minimal storage would be fine, but the priority would need to be set at 'low'. TD asked if the PMB felt it reasonable to require a response from Alice-UK prior to setting-up of support - the agreement was yes, engagement is required. TD and GP would iterate, draft an email and contact the individual involved - engagement was required along with estimates of requirements, otherwise no priority could be afforded Alice. AS advised that input was required before Wednesday at 10:30 am, which was the next CASTOR Team Meeting. TD noted a deadline of Tuesday evening for a response from Alice. 2. Tape Access =============== TD reported that there was a major issue w.r.t. tape use at CERN raised at last Tuesday's MB - in current operation it was clear that tape access was ~10MB/s (or less) rather than 50MB/s. The agenda link is here: http://indico.cern.ch/conferenceDisplay.py?confId=22194 -> Storage Efficiency TD advised that slides had been provided regarding rates at CERN for all experiments. The discussion at the MB related to tests of the tape system being incorporated into planning, but it was noted that there had been problems accessing tape. RJ advised that CERN were not providing D1T0 but were backing up to tape. There was a discussion regarding the processing and reading of tape. AS advised that there were performance issues as well, relating to concurrent writing to disk and reading from disk, and multiple streams. TD noted that CCRC was meant to address simultaneous contention, a week should be designated for ATLAS, CMS and LHCb re file access alongside user analysis. GP advised that all CASTOR sites were banned at LHCb at present for other reasons, therefore no efficiency figures were available. TD asked if a week was possible for large sequential access tests? AS advised that no week was yet designated except for CCRC. GP noted that migration to CASTOR has to happen for all experiments first. DB asked if extra tape drives were required at the moment. TD noted no, not yet - types of rate were required along with figures from tests, which would give realistic throughput to determine accurate disk/tape balance. JG suggested we go with the plan for February '08 then determine access rates in May. AS would contact Tim Folkes to order six tape drives as per the original plan. 3. GridPP20 Agenda =================== TD asked whether there were any user-based talks? Did GP, RJ, or DC have any speakers relating to hands-on experience of experiments? TD advised that the registration listing was currently being used to determine possible speakers but Chairs had not yet been finalised. Were there any updates to the main Agenda? This was ongoing. 4. AOCB ======== None. STANDING ITEMS ============== SI-1 Dissemination Officer's Report ------------------------------------ SP reported that a rejection had been received from the Royal Society Summer Exhibition - SP would pursue feedback regarding this rejection. However, STFC had an LHC stand accepted and have said they will aim to include something about Grid on this. SP expressed thanks to DB for passing on a couple of suggestions about news items. SP had contacted UKQCD about news items on their biomed mini-PIPSS award and a demo of integrating 5 regional Grids shown at a recent conference. SP was also currently working on something about GANGA, and Mike Kenyon would forward information on ELSSI. SP reported that Neasan O'Neill would attend the EGEE All Activities meeting in Bulgaria next week at the request of EGEE NA2, to take part in a meeting discussing Grid communication strategies. The second phase of the bid for an STFC Science in Society large award, to fund someone for LHC@home, was currently being worked on. This was due at the end of this month. SI-2 Tier-1 Manager's Report ----------------------------- AS reported as follows: 1) Tenders: a) Disk tender - delivery is scheduled for Thursday this week - if all goes to schedule, acceptance will be complete by the end of February. b) CPU tender - the order had been placed and scheduled for delivery 28 February. c) Tape drive purchase - the purchase plan was being finalised. If the order is placed in the next couple of days we may be able to get the equipment on the ground in time for February's CCRC08. 2) Memory upgrades are all completed. Closed. 3) Work on the power supply is proceeding - so far with no disruption to service. Measurements indicate that we have (just) sufficient power to operate with one transformer out of service. This will continue to be the case until late February (when the next CPU delivery will push us over the limit). As it is likely that transformer work will be completed before the CPU delivery, it is likely that e-Science will not have to reduce electrical load. 4) The RAL PPD disk space loan (approx 80TB) is available. Service ------- 1) SAM availability for last week was 99%. 2) CASTOR: a) Problems with the ATLAS CASTOR instance were traced to queries overflowing the Oracle query cache. This was increased and ATLAS production restarted on Wednesday. b) LHCB have encountered problems (also at CNAF) where rfio requests leave files open after the end of the IO job. This gradually leads to a degradation in performance as all IO job slots become occupied. Investigations are still underway. 3) SL4 Migration - The SL4 UI is configured and is being tested. 4) The LHCB ORACLE based LFC is operating well - Item closed. Progress to Grid Only Access - This standing item documents the status of work towards achieving GRIDPP milestone 0.18 "Access to Tier-1 resources by Grid Interfaces Only" 1) qsub access was scheduled to terminate last Friday but we have a few details to finalise and will finally switch off qsub by Wednesday. SI-3 Production Manager's Report --------------------------------- JC reported as follows: 1) There have been several requests for improvement/changes to the EGEE broadcast system. 2) A new process has been introduced whereby a ticket is not closed but goes in to the "verify" state. 3) A bug in the service availability algorithm in Gridview (so that the calculation considers services with no critical tests as up and available) will be corrected from today. 4) Manchester has ~9GB of space occupied by CMS and ALICE software. Considering the policies of these experiments the site wants to know how to deal with this software (extra space on the software servers would be useful). 5) Over the Christmas period the old gridpp VOMS certificate expired. The resultant site reaction indicated that the change over was not widely known. 6) Ops test performance over the Christmas and New Year period has been stable for most sites. Several sites were 100% available. The worst performing sites over the period are similar to during November/early December. Overall Q4 saw an average availability of 86% vs 85% for Q3. 7) The most significant problem over the last few weeks (as already discussed) was for ATLAS due to CASTOR. This has lead to reduced use of UK Tier-2s. There was a discussion regarding enabling and supporting VOs and the space available to them that sites are responsible for. It was agreed that 9GB was not felt to be excessive for a software area and that a bigger area was appropriate if required. TD noted that VOs should be supported on a site basis and any plans to drop individual VO support should be after discussion with the Region and ultimately with the VO concerned. It was reported that ECDF at Edinburgh was now a new site with a shared cluster. Meetings: A) There was a CCRC'08 planning meeting on 10th Jan: http://indico.cern.ch/conferenceDisplay.py?confId=24844 B) There was a GDB last week: http://indico.cern.ch/conferenceDisplay.py?confId=20225. The focus was benchmarking; data management; worker node issues and security policies SI-4 LCG Management Board Report --------------------------------- It was noted that experiment requirements were still awaited in response to MB questions. RJ, GP, DC would be sent a url relating to CCRC08 with planning meeting details, so that the summary of experiment requirements can be checked to ensure no major mismatch [done during meeting]. TD reported that the tape issue had already been covered and that CCRC planning would be reviewed again next time. SI-5 Documentation Officer's Report ------------------------------------ SB was not present. REVIEW OF ACTIONS ================= 272.4 AS to check the current Tier-1 disaster recovery plan and circulate the existing version to the PMB. It was reported that this document does not exist, but it was planned to have one in the longer term. TD would incorporate in v0.4 anything that AS considered relevant. AS will check and advise additions. Ongoing. 277.2 DC to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. Ongoing. 277.5 Disaster Recovery 'Team B': SB, JC, TD, SP, DB to analyse the wider issues of disaster planning, mapped to the experiments' lists, and this work would include Project Management. A Recovery Plan was required. It was agreed that JC was in charge of this and the experiment input relating to subsets of the disaster plan. SB/JC to progress. It was noted that the AFC Service was also linked to this. Ongoing. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. Ongoing. 280.6 JG brought up the issue of the biomed VO and 'sieving' at the ROC Manager's meeting - a broadcast is to go out from EGEE which will be helpful in underlining acceptable use of Grid resources and would act as a reminder to VOs about the policy they have signed-up to in relation to their users. JC had now emailed the Chair to have this discussed. JG reported that a new VO was now set up but there were few resources allocated to it as yet, although the home Institute may be giving funds. Pending further info from JC. EGEE broadcast action ongoing - JG will bring-up the broadcast action at the ROC VO meeting tomorrow (Tue 15). Ongoing. 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. JC reported that he had received a request from OMII to set-up a GridPP VO - it was preferable for this to be done through NGS. Ongoing. 280.8 JG to investigate the UKI ROC website - any change/progress, and report-back. Ongoing. 282.2 SP to progress the Project Map using the T1 service areas and input from the meeting. Ongoing. 282.6 JC and SB to progress existing 'disaster planning' template for next F2F meeting on 1st Feb. Involve experiments as necessary. This was a follow-up from the last F2F, and was to be distinguished from 277.5 action which is a longer-term one relating to the OC. 283.1 TD to arrange a phone connection at TC Dublin for RJ to join the GridPP20 PMB meeting remotely. Ongoing. 283.3 RM/TD to prepare use cases appropriate for the UK community, [relating to item 278.10 EGEEIII -> EGI]. RM reported that he would be attending a workshop at the end of January at CERN (by EGI design study project) and would report-back at that time. RM reported that use case and functions parts of the EGI website were now publicly visible. RM would circulate the url for the use cases - a template was available to be completed. All: to provide inputs to RM in the template format provided via the url. Done, action closed. 286.1 RJ to call a NorthGrid meeting to decide hardship funding allocations to Institutes. RJ reported that a meeting had been held this morning. Information would be sent to SL. RJ summarised that the largest figure would go to Sheffield: 12k, with 6k each to Liverpool, Lancaster, and Manchester. 286.2 SL and DB to iterate regarding clause associated with the issuing of Tier-2 hardware grants. SL had sent DB an email with suggestions. Ongoing. 286.3 AS to formally apologise to ATLAS on behalf of GridPP for the outage problems over the Christmas period. AS reported that he had sent a formal email apology to Kors. The identified cause had now been resolved and ATLAS production re-started ok. Done, item closed. 286.4 GP to advise the UB that the special cases for non-Grid access to the UK Tier-1 were approved. Done, item closed. 286.5 AS to organise a service message at login relating to non-Grid access being withdrawn. Ongoing. 286.6 JC and SB to incorporate the AFS Service into the disaster planning document. This was added to the list. Done, item closed. ACTIONS AS AT 14.01.08 ====================== 272.4 AS to check the current Tier-1 disaster recovery plan and circulate the existing version to the PMB. It was reported that this document does not exist, but it was planned to have one in the longer term. TD would incorporate in v0.4 anything that AS considered relevant. AS will check and advise additions. 277.2 DN to provide an update and re-evaluation of CMS/CASTOR deliverables. TD advised that there was a CMS/CASTOR document on deliverables which should be revised in light of the December '07 tests. DC to take the token for this now and iterate with DN. 277.5 Disaster Recovery 'Team B': SB, JC, TD, SP, DB to analyse the wider issues of disaster planning, mapped to the experiments' lists, and this work would include Project Management. A Recovery Plan was required. It was agreed that JC was in charge of this and the experiment input relating to subsets of the disaster plan. SB/JC to progress. 277.8 User Experience 'Team C': SB, SP, SL, with input from JC to deal with the issue of user experience and design of an easily-found lookup facility for grid error messages. SL reported that he had started the ATLAS wiki page and would circulate the url. 280.6 JG brought up the issue of the biomed VO and 'sieving' at the ROC Manager's meeting - a broadcast is to go out from EGEE which will be helpful in underlining acceptable use of Grid resources and would act as a reminder to VOs about the policy they have signed-up to in relation to their users. JC had now emailed the Chair to have this discussed. JG reported that a new VO was now set up but there were few resources allocated to it as yet, although the home Institute may be giving funds. Pending further info from JC. EGEE broadcast action ongoing - JG will bring-up the broadcast action at the ROC VO meeting tomorrow (Tue 15). 280.7 JC to mention the issues (when approached by a VO with regard to joining) of the 'standard' 6-month introduction period, following which the VO must set-up something specific to them, if appropriate. This was discussed at DTeam. JC to email GridPP VO members if possible - ongoing. This was a standing action - JC had discussed it with the Tier-2 Co-ordinators in relation to VO members. JC to send email. 280.8 JG to investigate the UKI ROC website - any change/progress, and report-back. 282.2 SP to progress the Project Map using the T1 service areas and input from the meeting. 282.6 JC and SB to progress existing 'disaster planning' template for next F2F meeting on 1st Feb. Involve experiments as necessary. This was a follow-up from the last F2F, and was to be distinguished from 277.5 action which is a longer-term one relating to the OC. 283.1 TD to arrange a phone connection at TC Dublin for RJ to join the GridPP20 meeting remotely. 286.1 RJ to call a NorthGrid meeting to decide hardship funding allocations to Institutes. RJ reported that a meeting was scheduled for this morning. Information would be sent to SL. RJ summarised that the largest figure would go to Sheffield: 12k, with 6k each to Liverpool, Lancaster, and Manchester. 286.2 SL and DB to iterate regarding clause associated with the issuing of Tier-2 hardware grants. Ongoing. 286.5 AS to organise a service message at login relating to non-Grid access being withdrawn. 287.1 TD and GP to iterate, draft an email, contact the Alice representative (technical) at CERN and request inputs regarding estimates of requirements for disk allocation - deadline for response from Alice was Tue evening (15 Jan). 287.2 AS to contact Tim Folkes to order six tape drives as per original plan. 287.3 All: to provide inputs to RM in the template format provided via the circulated url - re EGEEIII -> EGI and use cases. INACTIVE CATEGORY ================= 271.1 PMB to examine the issue of fibre breakage and outages, CERN-RAL OPN link, in one year's time, when actual data on breakages is available. Due date would be September '08. 271.3 Re CERN-RAL OPN link breakage and backup generally, PC to oversee the issue and collate info so that the PMB have something to revisit in one year's time. Due date September '08. It was noted that PC would circulate a revised document after discussion with ATLAS (RJ/PC/DN to iterate). 282.8 RM to monitor how R-GMA and networking issues impact on GridPP as matters progress. RM advised that this item should be moved to the 'inactive' category as it will develop over the coming months. RM discussed the issue with Steve Fisher and advised that support of R-GMA is required whilst APEL is dependent on it. RM reported that he has spoken to SF and there is currently no change to the R-GMA situation - process ongoing. The meeting closed at 2:30 pm. The next PMB would take place on Monday 21 January at 1:00 pm.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager