JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for TB-SUPPORT Archives


TB-SUPPORT Archives

TB-SUPPORT Archives


TB-SUPPORT@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TB-SUPPORT Home

TB-SUPPORT Home

TB-SUPPORT  September 2016

TB-SUPPORT September 2016

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Ops meeting @ 11am

From:

Matt Doidge <[log in to unmask]>

Reply-To:

Testbed Support for GridPP member institutes <[log in to unmask]>

Date:

Tue, 20 Sep 2016 13:57:34 +0100

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (37 lines) , 200916_ops_minutes.txt (349 lines)

Hello!
Please find attached the minutes (which I was denied being able to 
upload to Indico). Thanks to Pete and David for collating the attendees 
list. I hope I didn't miss anything important, or paraphrase anyone too 
imaginatively - please let me know if I did.

Actions were for sites to consider supporting vo.moedal.org [1][2], and 
on Sam to liaise with Oxford and Atlas to look at the diskless site 
testing again.

A quasi-action is on anyone wishing for a CERN External Account to 
contact Jeremy today if you haven't already done so.

Cheers all,
Matt

[1] https://operations-portal.egi.eu/vo/view/voname/vo.moedal.org
[2] moedal.org

On 20/09/16 10:44, Jeremy Coles wrote:
> Dear All,
>
> A reminder that our ops meeting is at 11am. The agenda is
> at https://indico.cern.ch/event/570325/. There are several updates in
> the bulletin that we will
> review https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest.
>
> Discussion this week will revolve around the GDB updates from last week
> (and the pre-GDB) and a continuation of our lightweight sites theme from
> GridPP37.
>
> Regards,
> Jeremy
>
>



Chair: Jeremy C Minutes: Matt D Atttending: Alessandra Forti. Andrew Washbrook, Andrew Lahiff, Brian Davies, Chris Brew, Dan Traynor, Daniela Bauer, David Crooks, John Bland, John Hill, Marcus Ebert, Oliver Smith, Winnie Lacesso, Pete Gronbech, Raul Lopes, Robert Frank, Sam Skipsey, Steve Jones, Tom Whyntie, Vip Davda, Matt Williams, Ian Loader. Apologies: Andrew McNab, Ian Neilson, Raja, Duncan. *Experiment problems/issues LHCB - no one could attend CMS - Daniela - CERN has intermittant connectivity problems affecting xrootd redirectors. Not much can be done. Brunel have a cms ticket that's being actively debugged. ATLAS - No one had any problems to report. Alessandra confirms this. Other VO updates. No other VO updates New VO news from Tom moedal - Monopole searching VO, would like to use a grid for simulation. Technically an LHC experiment but would like support as a small VO. Linked to cernatschool work. Infrastructure (voms, cvmfs) setup already. Would anyone like to support them? moedal.org is their homepage, piggybacks on lhcb gauss infrastructure. Jeremy - are we targetting specific sites? Tom - QM, cernatschool supporters (Glasgow, Liverpool, Birmingham), will be using ganga for job submission stuff. No current sites in the UK support, possibly no other EGI or OSG sites, most simulation run "locally" so far. Website: https://operations-portal.egi.eu/vo/view/voname/vo.moedal.org Any interest among sites? Create an action to return to this - consider supporting the VO. Not a heavy load expected. Chris B - might be happy to do it at RALPP, possibly include it in a batch of VO additions along with cernatschool to reduce workload ("not difficult, just intricate") No Gridpp Dirac "news", but... http://bugzilla.nordugrid.org/show_bug.cgi?id=3600 Observations - Daniala: Hit by a bug in arc ce, new versions don't report max cpu/wall time, which dirac uses for queue matching. Tried some hacking which didn't work, have another go planned this afternoon. Currently no way of doing queue matching. Raul comments that arc never did it properly, Danieal replies that the problem comes from the new release being "zero". Andrew L - this has been a long term problem with Condor, which didn't have the concept of it. Brunel and ECDF have forced correction in arc. Andrew L - It appears batch system dependent. Daniela - It (wall/cpu time) can be set, but not sure what that will do. SGE seems to be broken too. Raul will double check and rehack if needed, asks Daniela to poke him if it still doesn't work. Steve - https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_Extra_BDII_Fields -See chat for a bit more on this. Jeremy - attached a slide to the MB to the agenda. Shows GGUS ticket statistics, for information and intestest only To the Bulletin! *Meetings and Updates     International Symposium on Grids and Clouds (ISGC) 2017 call for papers closes at the end of October. http://event.twgrid.org/isgc2017     August WLCG T2 Availability:         ALICE. All okay http://wlcg-sam.cern.ch/reports/2016/201608/wlcg/WLCG_All_Sites_ALICE_Aug2016.pdf         ATLAS. Glasgow: 86%:97% | Oxford: 82%:82% http://wlcg-sam.cern.ch/reports/2016/201608/wlcg/WLCG_All_Sites_ATLAS_Aug2016.pdf             Glasgow availability was down due to a power cut in their machine room at the beginning of the month. It took a few days to recover from it.             Oxford was down for a few days due to an A/C failure on Friday 12th August. The cluster was shutdown and restored on Monday 15th.         CMS. All okay http://wlcg-sam.cern.ch/reports/2016/201608/wlcg/WLCG_All_Sites_CMS_Aug2016.pdf         LHCb. All okay (but note ECDF as N/A). http://wlcg-sam.cern.ch/reports/2016/201608/wlcg/WLCG_All_Sites_LHCB_Aug2016.pdf     There was a GDB last week. Minutes will appear here. https://twiki.cern.ch/twiki/bin/view/LCG/WLCGGDBDocs#2016     Notes from Thursday's EGI OMB. https://indico.egi.eu/indico/event/2810/material/minutes/minutes.html https://indico.egi.eu/indico/event/2810/ Actions:         NGIs using the GOCDB API should assess if their use is compatible with the new developments available in the test instance.         Gather information about best practices for the users who are transitioning from WMS to DIRAC. Jeremy: Any feedback? Do we have anything to help with this Tom: Happy using Ganga. Daniela: I could try and dig up my talk for the dirac workshop. it's from May, but we did do a little survey on how VOs use dirac (see chat for more)         Discuss the CSIRT proposal with sites and ROD staff. -Meat of it is that sites will need to add pakiti client to (a) Worker Node(s).         The ARGO proposal for GOCDB proposal has an impact on the site managers and therefore NGIs should discuss this proposal with their sites and staff.     Notes from Monday's WLCG ops meeting. https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek160919 -Intermittant connectivity problems mentioned again, particularly a problem for CMS.     Jeremy C will follow up on External Accounts this week. -Has list of 5 names, anyone else wants added please contact Jeremy today.     Alastair mentions "ARC Camp!" for an interested person (TB-SUPPORT 14th Sept). -Useful for a technical person to attend to represent the work in the UK Andy W - Andy might be able to do it. Steve: Where will it be? AndyL Undecided, somewhere cheap, probably not in the UK     Decommissioning of the old downtime notification system took place last week. From now on use the [ https://operations-portal.egi.eu/downtimes/subscription new system]. -Probably the cause of any odd messages seen last week.         You have to select the targets of you subscription then a channel of communication (RSS, Ical or email) . Don't forget to fill your email address if you have selected the email channel!     VAPOR application v2.1 is now online. Various changes including integration of Gstat features. https://operations-portal.egi.eu/vapor -Overview of data, a lot of stuff previously in gstat. Worth a look if you haven't already.     APEL Tests Paused today - There is a temporary problem with the APEL Pub and Sync tests. They are not reflecting recent data received by the APEL repository. -No comments. *WLCG Operations Coordination     There was a WLCG Throughput call on 15th. https://indico.cern.ch/event/562629/ -Duncan was set to make it, but wasn't in the meeting today. Jeremy couldn't make it.     The next ops meeting is on 29th. Theme suggestions welcome. https://indico.cern.ch/event/540422/ -Please let Jeremy know if you have any suggestions. *Tier 1 A reminder that there is a weekly Tier-1 experiment liaison meeting. Notes from the last meeting here http://www.gridpp.ac.uk/wiki/RAL_Tier1_Experiments_Liaison_Meeting https://www.gridpp.ac.uk/wiki/Tier1_Operations_Report_2016-09-14     The use of both OPN links giving a maximum of 20Gbit connection to CERN and other Tier1s continues to run OK with use being made of the extra bandwidth.     In the last report (a couple of weeks ago) I mentioned some intermittent periods of high packet loss within the Tier1 network. This was resolved by replacing a network transceiver.     The first 100Gbit link within our internal Tier1 network has been put in place.     There was a preventative maintenance on the tape libraries last week. This was a general checkover of the hardware plus a firmware update. This went OK. Oracle wish to make an intervention on the libraries to improve some of the mechanics. We are scheduling this for the first week of November and is expected to be a day's downtime of each of the libraries.     We are in the process of moving services from the old WIndows Hyper-V 2008 virtual infrastructure to one based on the 2012 version. -No Tier 1 related issues raised. *Storage & Data Management Sam - preGDB and GDB happened. Will come back under discussion. *Tier-2 Evolution -Jeremy noted quiet since June. No open issues in JIRA. *Accounting -Some discussion at GDB, will come back to it later *Documentation GridPP Approved VOs now has link to RPM versions of the VOMS records. They are available for now via the VOMS RPMS Yum Repository. The latest version, which is consistent with the Yaim records in the Approved VOs doc, is 1.0-1. Plan is that when VO records change, Approved VOs doc version will be incremented, and RPMs of changed VOs (only those) will be released carrying the same version stamp as the document. Thus a site that upgrades to "latest" will get the records compatible with the newest version of the GridPP Approved VOs document. Note: A typical RPM contains as so: [sjones@hep169]$ rpm -qlp gridpp-voms-dteam-1.0-1.noarch.rpm /etc/grid-security/vomsdir/dteam /etc/grid-security/vomsdir/dteam/voms.hellasgrid.gr.lsc /etc/grid-security/vomsdir/dteam/voms2.hellasgrid.gr.lsc /etc/vomses/dteam-voms.hellasgrid.gr /etc/vomses/dteam-voms2.hellasgrid.gr /root/vo_xml/dteam.xml The vomsdir (lsc) files (which list the DNs and CA DNs of acceptable certificates) and the vomses files (which give the coordinates of VOMS servers of various VOs) are provided, as if they were created by YAIM in the normal locations. No other features of YAIM are facilitaed by these RPMs. Thus they are useful for migrating from YAIM, but do not provide all the functions of YAIM such as setting SW dirs or other ENV vars etc. http://hep.ph.liv.ac.uk/~sjones/RPMS.voms/ https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs -Steve will keep the document up to date, and the RPMS too. Notes this doesn't do everything that YAIM does/did. -Steve waiting to hear from Marcus and Gareth to see how this works. *Interoperation  The next EGI ops meeting is on 12th October. *On Duty Jeremy setting up the ROTA. *Rollout A lot of SL7 work in the UK, worth looking at and collating this. *Security     Changes to site (re-)certification procedure proposed at OMB to enable security vulnerability checks which are currently blocked due to move to Argo monitoring. [1]     IGTF & EUGridPMA (certificate issuing authorities) meeting [2]         Summaries of issues exploiting federated identity management (e.g. eduGain) and social id's (e.g. facebook) on Monday [3] [4]. [1] https://indico.egi.eu/indico/event/2810/ [2] https://indico.nikhef.nl/conferenceDisplay.py?confId=500 [3] https://indico.nikhef.nl/materialDisplay.py?contribId=1&materialId=slides&confId=500 [4] https://indico.nikhef.nl/materialDisplay.py?contribId=4&materialId=slides&confId=500 -Ian is likely at the meeting alongside Dave Kelsey, so it would be good to hear back. *Services  UK eScience CA - certificate issuance problems. Jens reported that on 15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was not correctly configured. -Hopefully hear back from Jens about this in the near future.     A large number of site admins and other GridPP supporters appeared to be suspended from the dteam VO last week. “During a planned upgrade operation of VOMS service, a system malfunction occurred. As a result, some users received false notification about membership expiration. We are in contact with the software development team in order to identify the cause.” Jeremy - everyone should be unsuspended now, but check if your AUP signing comes up. Anyone still unsuspended? No response. *Tickets. Were discussed. Steve will point Biomed to the Spacetoken documentation. *Other Bits Site round table will be needed soon. *GDB update Summary of GDB talks by Jeremy: https://indico.cern.ch/event/570325/contributions/2306936/attachments/1338612/2015890/September-GDB-2016.pdf Talk 1- WLCG workshop. Talk 2- IPv6. Atlas Canada - would like/are interested in pure IPv6, but not going ot get it yet. 1st April 2017 is the earliest date to be able to provide Ipv6 only compute and expect it to date. Some reckon that this is too soon, “Reasonable fraction on IPv6 by end LS2” Talk 3 - Review of nordic tier 1, with view of improving efficiency. Conclusions is that consolidation loses leverage, which increases cost in other areas, as seen in other studies. Talk 4 - Malware information sharing platforms. "Threat Intelligence". CERN MISP - access requiring egroup. Dave C - testing between Glasgow and RAL, with Jo a summer student. Interesting thing is the technical aspect of the sharing platform, but the meat is in the semantics of sharing this information. Big challenge in false postives. Maxmising trust is the bulk of the work. Data PreGDB. Brian - good summary on Jeremy's slides. Still trying to work out how to do storage accounting in an SRM-less setup. Caveat that SRM-less tape is on the backburner. Site perspectives saught to provide development in this area. IPv6 wasn't mentioned in the preGDB oddly enough. Different VOs have different pushes in which protocols to use. Gridftp big as it's usable at all sites (other options being xroot and http). Some of the possible ways of "providing" IPv6 is dual homed xrootd proxies. More focus on xroot and gridftp. Brian and Alastair's talk went down well, interesting analysis from IPNL3, studying access and create times of files on disk servers, noting differences in patterns between VOs. Updates from the various storage providers, including timelines and roadmaps. Worth looking at for each site. No questions. GDB Fast Benchmarking. Update from each VO. The slides tell all. LHCB- Dirac benchmarking gives a much clearer result to other benchmark. ATLAS - Alessandra - plan to add fastbenchmark to pilots and add to elasticsearch cluster. Aim to simplify effort in comparing things. No update from CMS on this at the GDB. No one present knows what CMS are doing on this. Jeremy will circulate any thing that comes out of these talks. Discussion: Continuing discussion from gridpp37 about lightweight sites. https://indico.cern.ch/event/556609/sessions/204093/attachments/1330334/1998927/Lightweight_sites_-_notes.pdf Starting halfway through storage sention. Sam - xroot support, globus connect. Jeremy - what can we do to aid this? Sam - arc caching testing at Durham, Sam has some stats on cache growth with atlas work. Aiming to work with rucio, but quiet on that front. Also work on network only site has been slowed, but partly reported on. HC infrastructure work was a block, but that should be done with as of yesterday. Progress, just not as much we'd like? Is it workable to run sites without storage? UCL works, but is in London. Network topology is very different, not applicable to the rest of the UK. Potentially not scabale outside London with JANET in the current state (and JANET not just used by us). Alastair having a look at this, getting a feel for network use for a certain cluster size. Loss of Ewan slowed this work. Brian - preGDB on this was theorectical analysis over what would happen if we lost the smaller sites - wrt loss of job slots, increased network load. Can a disk less site cope with the connections out? Jeremy - is there a timeline for some conclusions on this? Sam - wanted to be at that point now, Pete - anything at Oxford that we can do to help? Sam - possibly nothing at Oxford, work was on the atlas infrastructure. Should be there. Pete- we're as staffed as well as used to. Sam - what's useful is to know what the monitoring is like to understand what's going on as well as possible. Looking at Network and Job monitoring. Sam will send email to Kashif and Alastair about it - putting it into the actions. Once you offload site services, such as storage, how do you monitor a site with depenencies at the other site. Wider discussion we need to have. Potential issue talked about in the storage evolution document. Who do we ticket? See this with CMS now. Would ticket reassignment be a job of the site? Low efficiency might be due to job types. Global redirector and Dirac incompatability. Dirac can't job match a job to a site efficiantly if the data is "everywhere". Sam - No win situation here. Trade off we cannot avoid. How would this work with Dirac? Sam - LHCB manage it already, so we should check how they do it. Funding policy for these new types of storage? Brian - Assume sites move to being T2C? Degrading cache as existing storage ages and isn't replaced? Continued funding for continued storage provision? Jeremy - A list of high level questions, could these be written down? What do experiments themselves want? Sam - this is better understood after the preGDB. xrootd federation pilot Sam - can have higher levels of xroot redirectors, could have a UK level one that would redirect to a subset to exposed UK xrootd endpoints and have that as the top level interface of that storage, so we're only "one" endpoint. If you go all in with xroot can do cache layers, reliability via redirection. Need to do a pilot of this first. May or may not interact with experiment plans for gridftp, there is a plugin but it might not work very well. Come back to the rest of this another day. Actions - minutes. Make sure you upload the minutes! Jeremy will continue setting up egroups. AOB? None. Reiterated that we will need to do a Tier 2 review at some point soon. Chat Window Alessandra Forti: (11:08 AM) For ATLAS there isn't much to report. same tickets as last week Tom Whyntie: (11:08 AM) moedal.org https://operations-portal.egi.eu/vo/view/voname/vo.moedal.org Daniela Bauer: (11:16 AM) http://bugzilla.nordugrid.org/show_bug.cgi?id=3600 raul: (11:16 AM) ArcCEs have always reported that incorrectly. I've forced a correctiion for Brunel Andrew John Washbrook: (11:17 AM) us too (ECDF) raul: (11:17 AM) I'll check and hack it. Daniela could email me tomorrow if Ii don't Steve Late: (11:19 AM) https://www.gridpp.ac.uk/wiki/Example_Build_of_an_ARC/Condor_Cluster#Patch_for_Extra_BDII_Fields Daniela Bauer: (11:20 AM) @Raul Sure, will do. But it seems endemic, it's defintely not just you. Steve Late: (11:20 AM) Patch for Extra BDII Fields To set the GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime bdii publishing values, you need to change the lines involving GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime in /usr/share/arc/glue-generator.pl. For example: GlueCEPolicyMaxCPUTime: 4320 GlueCEPolicyMaxWallClockTime: 4320 I was only late once; but it never forgets for some reason! Daniela Bauer: (11:23 AM) I could try and dig up my talk for the dirac workshop it's from May, but we did do a little survey on how VOs use dirac raul: (11:23 AM) hacking glue-generator.pl has always been my option. However, I've upgraded all CEs recently and forgot about it. Andrew Lahiff: (11:24 AM) Can't your configuration management system take care of that for you? Daniela Bauer: (11:25 AM) @Jeremy: Maybe this is useful: https://indico.cern.ch/event/477578/contributions/2168288/ Jeremy Coles: (11:40 AM) Yes. Thanks Daniela. raul: (11:41 AM) @Andrew: If the configiration system can take care of glue in Arc? yes I keep postponing as a minor problem that Arc would solve in the "next" version Chris Brew: (11:51 AM) raul - I think I saw some official statment from Arc the Glue 1 is obsolete and they will no longer fix any issues with it. raul: (11:52 AM) Yes, I think I saw in their list, but really it was not clear for me what do Jeremy Coles: (12:06 PM) https://indico.cern.ch/event/556609/sessions/204093/attachments/1330334/1998927/Lightweight_sites_-_notes.pdf Paige Winslowe Lacesso: (12:11 PM) Apologies, I have to leave now. Daniela Bauer: (12:12 PM) @Chris: If this is something that needs to be set by hand again everytime you upgrade it should be in teh arc.conf And the information doesn't seem to be rpesent in glue2 either raul: (12:17 PM) Thatt's what got me confused. glue1 is out, glue2 doesn't have it. Yet, I seem to have seen a discussing in the nordugrid list about support for some glue stuff. confused again David Crooks: (12:29 PM) Cheers  

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager