JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for UKHEPGRID Archives


UKHEPGRID Archives

UKHEPGRID Archives


UKHEPGRID@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

UKHEPGRID Home

UKHEPGRID Home

UKHEPGRID  June 2011

UKHEPGRID June 2011

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Minutes of the 430th GridPP PMB meeting

From:

David Britton <[log in to unmask]>

Reply-To:

David Britton <[log in to unmask]>

Date:

Thu, 30 Jun 2011 10:24:43 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (91 lines) , 110627.txt (1033 lines)

Dear All,

             ****REMINDER****
Abstract submission deadline for ACAT is tomorrow
http://acat2011.cern.ch/


Please find attached the GridPP Project Management Board Meeting minutes
for the 430th meeting.

   The latest minutes can be found each week in:

http://www.gridpp.ac.uk/php/pmb/minutes.php?latest

as well as being listed with other minutes at:

http://www.gridpp.ac.uk/php/pmb/minutes.php

Cheers, Dave.

-- 
________________________________________________________________________
Prof. David Britton                          GridPP Project Leader
Rm 480, Kelvin Building                      Telephone: +44 141 330 5454
School of Physics and Astronomy              Telefax: +44-141-330 5881
University of Glasgow                 EMail: [log in to unmask]
G12 8QQ, UK
________________________________________________________________________
































































GridPP PMB Minutes 430 (27.06.11) ================================= Present: Dave Britton (Chair), Jeremy Coles, Pete Gronbech, Dave Kelsey, Steve Lloyd, John Gordon, Roger Jones, Andrew Sansum, Tony Cass, Neil Geddes (Suzanne Scott - Minutes) Apologies: Tony Doyle, Robin Middleton, Pete Clarke, Glenn Patrick, Dave Colling 1. Input to 'Future of Research' draft ======================================= DB asked for inputs to the discussion on the UK Research Computing Ecosystem document which Peter Coveney had produced. DB had circulated various drafts of the GridPP response. It was noted that the document described neither HEP nor GridPP in the UK. It had been written from an HPC perspective. DB noted an implicit danger as the document appeared to apply to the whole of UK Research Computing, and in the long run this would be problematic if high-level discussions did not include what the HEP community had achieved and what it currently did. DB noted that it was possible to be helpful to them, as they did have a problem to solve. DB asked if we wanted to be included or excluded from this paper. Comments? NG advised that the document had grown out of a number of different themes which were running, partly due to Malcolm Atkinson resigning. At the e-Science Directors' meetings, Edinburgh had been keen to be involved, and the UK was not in PRACE, therefore the community meetings had proposed this course of action for a Town Meeting to discuss UK e-Science. In parallel, involved in Collaborative Computational Projects (CCP), Peter Coveney had acted to produce a strategy document which had been discussed at UCL, and several people had been tasked with writing different sections of this document. It was felt that they had a stronger case if there was community-wide buy-in. NG noted that HEP references had been included in the first draft but had been removed in subsequent drafts. The Recommendations hadn't been discussed at all. JG thought that some of the Recommendations were non-starters, especially the funding idea of a 'central pot'. DB agreed, noting that if this were propagated through they system it could affect our funding. NG noted yes, it could affect research, especially people on the boundaries of different research projects which were funded by different research councils. DB noted that he had moved through various drafts of his response letter, but that we should be supportive if we could. NG thought that the document had to be inclusive in order to be successful. SL considered that we should hold ourselves up as an example of how things do work. JG thought this should include NGS. SL disagreed, noting that we should answer this from a GridPP point of view. DB agreed, advising that we should not make things too diffuse. Their document needed to be clear it wasn't talking about HEP. DB asked if the PMB were comfortable with both direction and tone of his response? Yes. SL advised that we could have an Annexe document that summarised GridPP, and possibly add the List of Roles into that? DB suggested putting the Appendices into a separate document? SL thought it didn't matter very much. DB considered that the background information gave the strength and breadth of GridPP. DB would do a final draft today and send it to Peter Coveney. Any more comments were welcome. 2. Accounting - new metrics from Manchester ============================================ SL reported that he had discussed the Accounting with Mike, who had been going to suggest using different metrics. SL had explained everything to him in detail, from the beginning, explaining why the current method being used was not the best. Mike had seemed reasonably happy and had understood why we were doing what we were doing. SL had subsequently received an email from him saying that they were not against our methods but that they were looking at the ATLAS numbers for consistency. They believed that they could come up with a better metric. SL noted that if they were to do this, he needed it urgently. There ensued a discussion on normalised CPU. DB advised that we would consider a suggestion from Manchester but that it would be needed before the end of the month. PG thought that there should be a heavier weighting on analysis and production work actually done at sites rather than what CPU was available. DB noted that we had already discussed these points and it was ultimately an ATLAS choice as to how and why they distributed the funds - it wasn't something that the PMB should be involved in. PG would speak to RJ offline. DB asked SL to contact Mike and ask for any input from them by the end of June (this Thursday). SL advised that he was only proposing to change the HEPSPEC of used CPU not that advertised. 3. SNO+ Resource Request ========================= DB reported that there had been an email request for resources for SNO+. DB didn't think it looked too unreasonable. AS noted he was re-doing the tape planning anyway - we had 1-1.5 PB of tape for 'other' experiments for that period therefore the 300TB SNO request seemed manageable. In general, SL advised that there were two models we currently used: (a) we asked explicitly for support for 'others' in each GridPP proposal, using whatever numbers the 'others' come up with and then they live within this (currently 10%); (b) other communities request funds from PPRP for computing which GridPP then administers and gives them a guaranteed share. DB advised that SNO should request the computing they wanted in their grant application, then we could include a resource request line for that experiment. This was the best model, cf UKQCD/LHC 'others'. SNO+ could simply be a new line item. ACTION 430.1 Re the request for resources from SNO+, DB to draft something for GP to respond and feed- in. 4. wLCG Technology Evolution Group =================================== DB reported that at the last GDB it had been agreed to start a working group to understand the technical evolution of wLCG. The suggested format was a forum on Tuesdays, before the GDBs, where detailed technical discussions could take place. JG advised that some discussions were too big and complex for the GDB, eg: multi-user pilot jobs framework. There was a need for another forum which was a smaller group with site representation. JG was not convinced that it would work in the model proposed, that of a core of people discussing all issues. The Tier-1 could decide for themselves regarding their representative; for the operations part perhaps JC and another for the site delegate? JG noted he wasn't on this group. DB agreed that it probably wouldn't work, but if it did, for the UK we needed one person there as it would be good to have someone in the room. JG asked if we could get one person from the Tier-1/Tier-2? DB thought that Romain was a good candidate for security, but DK was also required for policy issues. DB suggested nominating JC for operations and/or PG for the Tier-2? It was agreed to nominate both. ACTION 430.2 DB to nominate both JC and PG for membership of the wLCG technical evolution working group, to ensure UK representation. 5. AOCB ======== a) DB reported that, regarding GridPP28 in 2012, EGI had chosen the same week for their meeting. What constraints were there from our side regarding different dates? 19-23 March was out. The IoP was the week after 26th March, then it was Easter. DB thought it would need to be either 11-13 April, or the following week, 16-20 April. RJ noted that 23rd was term time and he would likely be teaching. ACTION 430.3 DB to contact Mike Seymour at Manchester and find out what dates were possible from their point of view - possibly w/c 16 or 23 April 2012. b) Re Capital Expenditure for FY10 - the message was that we can bill the tape drive infrastructure to FY10 (GridPP3). This was £200k, but the maintenance could not be accrued, therefore for the drives only it was £184k. AS advised that this was a 'done deal' now unless the auditors rejected it. The one complication was that the credit would not show at project level, only at cost centre level, which meant that it wasn't visible. We would need a letter that documents this. DB noted that this was potentially good news, we could change our accounting to register the credit and we would tell STFC that we spent on that budget. AS asked if we would spend the credit in this financial year? DB noted probably not, but it might be required at some point in the future. DK noted it gave us potential flexibility. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) FY11 procurements - EU tender for disk framework PQQ evaluation complete and supplier shortlist agreed. Expect ITT to go out late this week or early next week. - CPU framework PQQ ready to go out.     2) SL08 considered deployable. Plan to redeploy as required into T1D0 service classes. There ensued a discussion on tape buffer and LHCb requirements. 3) FY10 Tape drive purchase - update on delivery and financial profile available. 4) Probable intervention on OPN router on 5th July 8-10am (TBC) is likely to cause a break in connectivity    from the WAN to our disk servers.     Service: 1) Summary of operational issues is at:     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-06-15     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-06-22      2) CASTOR * CASTOR outage (two periods of about 6 hours) over the weekend owing to database problems. Still under investigation but probably caused by database problems on the Neptune RAC. * High load on tape recalls for LHCB coupled with a number of issues (size of service class, disk server read/write contention/performance, migration policy, poor localisation of data on tape) has led to delayed tape access for LHCB. We are working on a number of these issues. * Expect to upgrade CASTOR tape servers to 2.1.10-1 to enable T10KC - expected 5th July. Will need downtime (probably co-scheduled with the network intervention. * Preparing T10KC migration plan. Most of the pieces are already in place and we now need to agree which VOs we will migrate and when. Staff: 1) Grid team leader post internal recruitment unsuccessful (late last week). Considering alternatives. 2) Paperwork for four other vacancies has been approved! Expect to submit to SSC in next day. * Two system admins for Fabric team * One CASTOR admin * One Grid Team member SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) There are now 11 GridPP sites with glexec enabled and passing the ops VO tests on at least one cluster (RHUL; Birmingham; Brunel; Bristol?; Liverpool; RALPP; RAL Tier-1; Glasgow; Oxford and Sheffield). A couple of sites are still enabling it and may be ready this week. 6 sites are waiting for a form of relocatable installation (we have not yet got any specific dates back on this yet but if it looks too far away will look again at building from source). 2) There have been some problems with APEL publishing for most sites during the last week. This now looks to be resolving and may have been due to the Spanish Tier-1 republishing a lot of data leading to timeouts for others trying to upload data. 3) Grid Ireland has finished the process of creating NGI_IE. This should mean that we begin the move to “NGI_UK” very soon. 4) Some sites have been setting up iperf servers to help understand issues being found with the perf-sonar tests: http://tinyurl.com/6a7dshg. Some WLCG Tier-1s have agreed to provide a service too but with mixed feelings. There was a discussion on the difficulty of eliciting details on the GridMon setup to enable operation at Glasgow.   5) An Authentication Bypass Vulnerability in torque that if exploited allows unauthorized users to submit jobs has required some sites to update their torque configuration settings and revise their firewall rules.   6) Pete Gronbech observed a problem with the REBUS updater that meant site CPU values were not updated correctly. This has now been fixed. This was noticed because the GridPP accounting table did not update the site available CPU resources after additional nodes were put online.   7) WLCG has now released an updated version of the monthly availability and reliability figures for Tier-2 sites with CREAM now correctly accounted. This update shows some improvements in the GridPP site figures but does not introduce new site issues to discuss today (see the explanations given at the last PMB). A) The summer HEPSYSMAN meeting takes place later this week at RAL http://hepwww.rl.ac.uk/sysman/June2011/agenda.html. In addition to site updates and a security workshop on the last day, those in the ops team will try to fit in discussions about the (individual) ops team tasks. B) There will be a Lustre workshop at QMUL on 14th July http://www.lustreusergroup.org/. SI-3 ATLAS weekly review & plans --------------------------------- RJ reported that they were doing network testing; there was an issue of load going through the Tier-1 which they were investigating; ATLAS production worldwide crashed on Friday morning last, queues still existed (this was not a UK issue, it was global). SI-4 CMS weekly review & plans ------------------------------- DC was not present. SI-5 LHCb weekly review & plans -------------------------------- In absentia GP reported: 1) LHCb has had a few problems with “input data resolution” failures. Usually, this is due to input data not found on SE. Also, a rise in the number of jobs with “Watchdog identified job as stalled” – usually due to problems access/streaming data at worker node. Some problems also with DIRAC staging and SRM unresponsiveness. 2) From RAL Tier 1 side, a number of problems with staging data (stuck tapes, daemons, etc). Also, some long delays between staging and being able to access data. Castor then went down due to database issues over the weekend (I think this only affected LHCb and ATLAS). SI-6 User Co-ordination issues ------------------------------- GP was not present. Please see agenda item 3 for discussion of SNO+ resources. SI-7 LCG Management Board Report --------------------------------- There had been no MB. SI-8 Dissemination ------------------- SL reported that he had started putting the weekly minutes onto the GridPP website in docs/Minutes. This would give an idea of issues currently being covered. DB advised that this was a reminder that the document page should be re-organised - a higher-level front page was required to facilitate ease of access to the various documents. This was on Neasan's 'to do' list. AOB === PG reminded the meeting about the Quarterly Reports. He would send out template reports to the different groups, but he needed target values for metrics. Users to reply please. RJ noted he could work on this on 1st July. RJ reported on issues with the Cream CE and Condor, which were currently being investigated. REVIEW OF ACTIONS ================= 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. Ongoing. 424.3: DB to contact ALICE-UK about Tier-2 resources. Ongoing. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. Ongoing. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. Ongoing. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. Ongoing. 428.3 JC to compile an info list relating to sub-nets at sites. Ongoing. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. Ongoing. ACTIONS AS AT 27.06.11 ====================== 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. 424.3: DB to contact ALICE-UK about Tier-2 resources. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. 428.3 JC to compile an info list relating to sub-nets at sites. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. 430.1 Re the request for resources from SNO+, DB to draft something for GP to respond and feed- in. 430.2 DB to nominate both JC and PG for membership of the wLCG technical evolution working group, to ensure UK representation. 430.3 Re GridPP28, DB to contact Mike Seymour at Manchester and find out what dates were possible from their point of view - possibly w/c 16 or 23 April 2012. Forthcoming PMB meetings would take place on the following dates: **** Fri July 15th **** Mon July 25th Mon Aug 8th Mon Aug 22nd Mon Sep 5th Tue Sep 13th F2F@CERN Mon Sep 26th GridPP PMB Minutes 430 (27.06.11) ================================= Present: Dave Britton (Chair), Jeremy Coles, Pete Gronbech, Dave Kelsey, Steve Lloyd, John Gordon, Roger Jones, Andrew Sansum, Tony Cass, Neil Geddes (Suzanne Scott - Minutes) Apologies: Tony Doyle, Robin Middleton, Pete Clarke, Glenn Patrick, Dave Colling 1. Input to 'Future of Research' draft ======================================= DB asked for inputs to the discussion on the UK Research Computing Ecosystem document which Peter Coveney had produced. DB had circulated various drafts of the GridPP response. It was noted that the document described neither HEP nor GridPP in the UK. It had been written from an HPC perspective. DB noted an implicit danger as the document appeared to apply to the whole of UK Research Computing, and in the long run this would be problematic if high-level discussions did not include what the HEP community had achieved and what it currently did. DB noted that it was possible to be helpful to them, as they did have a problem to solve. DB asked if we wanted to be included or excluded from this paper. Comments? NG advised that the document had grown out of a number of different themes which were running, partly due to Malcolm Atkinson resigning. At the e-Science Directors' meetings, Edinburgh had been keen to be involved, and the UK was not in PRACE, therefore the community meetings had proposed this course of action for a Town Meeting to discuss UK e-Science. In parallel, involved in Collaborative Computational Projects (CCP), Peter Coveney had acted to produce a strategy document which had been discussed at UCL, and several people had been tasked with writing different sections of this document. It was felt that they had a stronger case if there was community-wide buy-in. NG noted that HEP references had been included in the first draft but had been removed in subsequent drafts. The Recommendations hadn't been discussed at all. JG thought that some of the Recommendations were non-starters, especially the funding idea of a 'central pot'. DB agreed, noting that if this were propagated through they system it could affect our funding. NG noted yes, it could affect research, especially people on the boundaries of different research projects which were funded by different research councils. DB noted that he had moved through various drafts of his response letter, but that we should be supportive if we could. NG thought that the document had to be inclusive in order to be successful. SL considered that we should hold ourselves up as an example of how things do work. JG thought this should include NGS. SL disagreed, noting that we should answer this from a GridPP point of view. DB agreed, advising that we should not make things too diffuse. Their document needed to be clear it wasn't talking about HEP. DB asked if the PMB were comfortable with both direction and tone of his response? Yes. SL advised that we could have an Annexe document that summarised GridPP, and possibly add the List of Roles into that? DB suggested putting the Appendices into a separate document? SL thought it didn't matter very much. DB considered that the background information gave the strength and breadth of GridPP. DB would do a final draft today and send it to Peter Coveney. Any more comments were welcome. 2. Accounting - new metrics from Manchester ============================================ SL reported that he had discussed the Accounting with Mike, who had been going to suggest using different metrics. SL had explained everything to him in detail, from the beginning, explaining why the current method being used was not the best. Mike had seemed reasonably happy and had understood why we were doing what we were doing. SL had subsequently received an email from him saying that they were not against our methods but that they were looking at the ATLAS numbers for consistency. They believed that they could come up with a better metric. SL noted that if they were to do this, he needed it urgently. There ensued a discussion on normalised CPU. DB advised that we would consider a suggestion from Manchester but that it would be needed before the end of the month. PG thought that there should be a heavier weighting on analysis and production work actually done at sites rather than what CPU was available. DB noted that we had already discussed these points and it was ultimately an ATLAS choice as to how and why they distributed the funds - it wasn't something that the PMB should be involved in. PG would speak to RJ offline. DB asked SL to contact Mike and ask for any input from them by the end of June (this Thursday). SL advised that he was only proposing to change the HEPSPEC of used CPU not that advertised. 3. SNO+ Resource Request ========================= DB reported that there had been an email request for resources for SNO+. DB didn't think it looked too unreasonable. AS noted he was re-doing the tape planning anyway - we had 1-1.5 PB of tape for 'other' experiments for that period therefore the 300TB SNO request seemed manageable. In general, SL advised that there were two models we currently used: (a) we asked explicitly for support for 'others' in each GridPP proposal, using whatever numbers the 'others' come up with and then they live within this (currently 10%); (b) other communities request funds from PPRP for computing which GridPP then administers and gives them a guaranteed share. DB advised that SNO should request the computing they wanted in their grant application, then we could include a resource request line for that experiment. This was the best model, cf UKQCD/LHC 'others'. SNO+ could simply be a new line item. ACTION 430.1 Re the request for resources from SNO+, DB to draft something for GP to respond and feed- in. 4. wLCG Technology Evolution Group =================================== DB reported that at the last GDB it had been agreed to start a working group to understand the technical evolution of wLCG. The suggested format was a forum on Tuesdays, before the GDBs, where detailed technical discussions could take place. JG advised that some discussions were too big and complex for the GDB, eg: multi-user pilot jobs framework. There was a need for another forum which was a smaller group with site representation. JG was not convinced that it would work in the model proposed, that of a core of people discussing all issues. The Tier-1 could decide for themselves regarding their representative; for the operations part perhaps JC and another for the site delegate? JG noted he wasn't on this group. DB agreed that it probably wouldn't work, but if it did, for the UK we needed one person there as it would be good to have someone in the room. JG asked if we could get one person from the Tier-1/Tier-2? DB thought that Romain was a good candidate for security, but DK was also required for policy issues. DB suggested nominating JC for operations and/or PG for the Tier-2? It was agreed to nominate both. ACTION 430.2 DB to nominate both JC and PG for membership of the wLCG technical evolution working group, to ensure UK representation. 5. AOCB ======== a) DB reported that, regarding GridPP28 in 2012, EGI had chosen the same week for their meeting. What constraints were there from our side regarding different dates? 19-23 March was out. The IoP was the week after 26th March, then it was Easter. DB thought it would need to be either 11-13 April, or the following week, 16-20 April. RJ noted that 23rd was term time and he would likely be teaching. ACTION 430.3 DB to contact Mike Seymour at Manchester and find out what dates were possible from their point of view - possibly w/c 16 or 23 April 2012. b) Re Capital Expenditure for FY10 - the message was that we can bill the tape drive infrastructure to FY10 (GridPP3). This was £200k, but the maintenance could not be accrued, therefore for the drives only it was £184k. AS advised that this was a 'done deal' now unless the auditors rejected it. The one complication was that the credit would not show at project level, only at cost centre level, which meant that it wasn't visible. We would need a letter that documents this. DB noted that this was potentially good news, we could change our accounting to register the credit and we would tell STFC that we spent on that budget. AS asked if we would spend the credit in this financial year? DB noted probably not, but it might be required at some point in the future. DK noted it gave us potential flexibility. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) FY11 procurements - EU tender for disk framework PQQ evaluation complete and supplier shortlist agreed. Expect ITT to go out late this week or early next week. - CPU framework PQQ ready to go out.     2) SL08 considered deployable. Plan to redeploy as required into T1D0 service classes. There ensued a discussion on tape buffer and LHCb requirements. 3) FY10 Tape drive purchase - update on delivery and financial profile available. 4) Probable intervention on OPN router on 5th July 8-10am (TBC) is likely to cause a break in connectivity    from the WAN to our disk servers.     Service: 1) Summary of operational issues is at:     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-06-15     http://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2011-06-22      2) CASTOR * CASTOR outage (two periods of about 6 hours) over the weekend owing to database problems. Still under investigation but probably caused by database problems on the Neptune RAC. * High load on tape recalls for LHCB coupled with a number of issues (size of service class, disk server read/write contention/performance, migration policy, poor localisation of data on tape) has led to delayed tape access for LHCB. We are working on a number of these issues. * Expect to upgrade CASTOR tape servers to 2.1.10-1 to enable T10KC - expected 5th July. Will need downtime (probably co-scheduled with the network intervention. * Preparing T10KC migration plan. Most of the pieces are already in place and we now need to agree which VOs we will migrate and when. Staff: 1) Grid team leader post internal recruitment unsuccessful (late last week). Considering alternatives. 2) Paperwork for four other vacancies has been approved! Expect to submit to SSC in next day. * Two system admins for Fabric team * One CASTOR admin * One Grid Team member SI-2 Production Manager's Report --------------------------------- JC reported as follows: 1) There are now 11 GridPP sites with glexec enabled and passing the ops VO tests on at least one cluster (RHUL; Birmingham; Brunel; Bristol?; Liverpool; RALPP; RAL Tier-1; Glasgow; Oxford and Sheffield). A couple of sites are still enabling it and may be ready this week. 6 sites are waiting for a form of relocatable installation (we have not yet got any specific dates back on this yet but if it looks too far away will look again at building from source). 2) There have been some problems with APEL publishing for most sites during the last week. This now looks to be resolving and may have been due to the Spanish Tier-1 republishing a lot of data leading to timeouts for others trying to upload data. 3) Grid Ireland has finished the process of creating NGI_IE. This should mean that we begin the move to “NGI_UK” very soon. 4) Some sites have been setting up iperf servers to help understand issues being found with the perf-sonar tests: http://tinyurl.com/6a7dshg. Some WLCG Tier-1s have agreed to provide a service too but with mixed feelings. There was a discussion on the difficulty of eliciting details on the GridMon setup to enable operation at Glasgow.   5) An Authentication Bypass Vulnerability in torque that if exploited allows unauthorized users to submit jobs has required some sites to update their torque configuration settings and revise their firewall rules.   6) Pete Gronbech observed a problem with the REBUS updater that meant site CPU values were not updated correctly. This has now been fixed. This was noticed because the GridPP accounting table did not update the site available CPU resources after additional nodes were put online.   7) WLCG has now released an updated version of the monthly availability and reliability figures for Tier-2 sites with CREAM now correctly accounted. This update shows some improvements in the GridPP site figures but does not introduce new site issues to discuss today (see the explanations given at the last PMB). A) The summer HEPSYSMAN meeting takes place later this week at RAL http://hepwww.rl.ac.uk/sysman/June2011/agenda.html. In addition to site updates and a security workshop on the last day, those in the ops team will try to fit in discussions about the (individual) ops team tasks. B) There will be a Lustre workshop at QMUL on 14th July http://www.lustreusergroup.org/. SI-3 ATLAS weekly review & plans --------------------------------- RJ reported that they were doing network testing; there was an issue of load going through the Tier-1 which they were investigating; ATLAS production worldwide crashed on Friday morning last, queues still existed (this was not a UK issue, it was global). SI-4 CMS weekly review & plans ------------------------------- DC was not present. SI-5 LHCb weekly review & plans -------------------------------- In absentia GP reported: 1) LHCb has had a few problems with “input data resolution” failures. Usually, this is due to input data not found on SE. Also, a rise in the number of jobs with “Watchdog identified job as stalled” – usually due to problems access/streaming data at worker node. Some problems also with DIRAC staging and SRM unresponsiveness. 2) From RAL Tier 1 side, a number of problems with staging data (stuck tapes, daemons, etc). Also, some long delays between staging and being able to access data. Castor then went down due to database issues over the weekend (I think this only affected LHCb and ATLAS). SI-6 User Co-ordination issues ------------------------------- GP was not present. Please see agenda item 3 for discussion of SNO+ resources. SI-7 LCG Management Board Report --------------------------------- There had been no MB. SI-8 Dissemination ------------------- SL reported that he had started putting the weekly minutes onto the GridPP website in docs/Minutes. This would give an idea of issues currently being covered. DB advised that this was a reminder that the document page should be re-organised - a higher-level front page was required to facilitate ease of access to the various documents. This was on Neasan's 'to do' list. AOB === PG reminded the meeting about the Quarterly Reports. He would send out template reports to the different groups, but he needed target values for metrics. Users to reply please. RJ noted he could work on this on 1st July. RJ reported on issues with the Cream CE and Condor, which were currently being investigated. REVIEW OF ACTIONS ================= 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. Ongoing. 424.3: DB to contact ALICE-UK about Tier-2 resources. Ongoing. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. Ongoing. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. Ongoing. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. Ongoing. 428.3 JC to compile an info list relating to sub-nets at sites. Ongoing. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. Ongoing. ACTIONS AS AT 27.06.11 ====================== 400.4 SL to co-ordinate changing the current GridPP MoU towards an MoU for GridPP4. In progress - document had been circulated. Any corrections to be sent to SL. 424.3: DB to contact ALICE-UK about Tier-2 resources. 425.7 DC to have an internal discussion within CMS relating to use of future technology and evolution of the computing model, from September to the next couple of years. DC to come up with possible suggestion of theme/topics for GridPP27 at CERN. 425.8 AS to consider any longer-term issues relating to storage, DPM, databases etc, and come back to DB with any ideas for sessions at GridPP27. 428.2 DC to check at Imperial regarding the new person dealing with ganga, in relation to a talk at ACAT. 428.3 JC to compile an info list relating to sub-nets at sites. 428.6 AS to come up with a proposal for how to use the current disk buffer at the Tier-1. 430.1 Re the request for resources from SNO+, DB to draft something for GP to respond and feed- in. 430.2 DB to nominate both JC and PG for membership of the wLCG technical evolution working group, to ensure UK representation. 430.3 Re GridPP28, DB to contact Mike Seymour at Manchester and find out what dates were possible from their point of view - possibly w/c 16 or 23 April 2012. Forthcoming PMB meetings would take place on the following dates: **** Fri July 15th **** Mon July 25th Mon Aug 8th Mon Aug 22nd Mon Sep 5th Tue Sep 13th F2F@CERN Mon Sep 26th

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

February 2024
January 2024
September 2022
July 2022
June 2022
February 2022
December 2021
August 2021
March 2021
November 2020
October 2020
August 2020
March 2020
February 2020
October 2019
August 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
May 2017
April 2017
March 2017
February 2017
January 2017
October 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
October 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager