Evening all!
It's the time of year when the ghost and ghouls of Hallow'een are being driven away by the multi-coloured explosions of bonfire night, and when I think I've managed to dodge getting the dreaded Fresher's Flu to be this time round only to be struck down into a whiny, sniffly shadow of a man.
So whilst I sit at home in my dressing gown, trying to balance two cats and a laptop (not easy), give thanks for your working nostrils as I give you the second to last full ticket review of the year.
22 Open UK tickets this month. Site by Site.
Sussex
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116915 (14/10)
Low availability Ops ticket. On holded whilst the numbers sooth themselves. On Hold (23/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116865 (12/10)
Sno+ job submission failures. Not much on this ticket since it was set In Progress. Looks like an argus problem. How goes things at Sussex before Matt RB moves on? (We'll miss you Matt!). In progress (20/10)
RALPP
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117261 (28/10)
Atlas jobs failing with stage out failures. Federico notices that the failures are due to odd errors - "file already existing", and that things seem to be calming themselves. He's at a loss of what RALPP can do. Checking the panda link suggests the errors are still there today. Waiting for reply (29/10)
BRISTOL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116775 (6/10)
Bristol's CMS glexec ticket. It looks like the solution is to have more cms pool accounts (which of course requires time to deploy). In progress (28/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117303 (30/10)
CMS, not Highlander fans, don't seem to believe that There can be only One (glexec ticket). Poor old Bristol seem to be playing whack-a-mole with duplicate tickets. Is there a note that can be left somewhere to stop this happening? Assigned (30/10)
ECDF
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (Long long ago)
Edinburgh's (and indeed Scotgrid's) only ticket is this tarball glexec ticket. A bit more on this later. On hold (18/5)
SHEFFIELD
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (18/6)
Gridpp (and other) VO pilot roles at Sheffield. No news for a while, snoplus are trying to use pilot roles now for dirac so this is becoming very relevant. In progress (9/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 (30/9)
Sno+ jobs failing, likely due to too many being submitted to the 10 slots that Sno+ has. Maybe a WMS scheduling problem - Stephen B has given advice. Elena asked if the problem persisted a few weeks ago. Waiting for reply (12/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116560 (17/10)
A ROD availability ticket, on hold as per SOP. On hold (20/10)
LANCASTER
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116478 (28/9)
Another availability ticket. Autumn was not kind to many of us! On hold (8/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116882 (13/10)
Enabling pilot snoplus users at Lancaster. Shouldn't have been a problem, but turned into a bit of a comedy/tragedy of errors by yours truly mucking up. Hopefully fixed now- thanks to Daniela for her patience. In progress (2/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (Far far away)
glexec tarball ticket. There's been a lot of communication with the glexec devs about this - the hopefully last hurdle is sorting out the RPATHs for the libraries. It's not a small hurdle though... On hold (2/11)
QMUL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117151 (23/10)
A ticket about jumbo frame problems, submitted to QM. After Dan provided some education the user replied, in that he only sees this problem at two atlas sites. But he is contacting the network admins at his institution to see if it is their end. On hold (29/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 (19/10)
ROD ticket for glue-validate errors. Went away for a while after Dan re-yaimed his site bdii, but possibly back again. Daniela suggests re-running the glue-validate test. Reopened (2/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117011 (6/10
Another ROD ticket, where Ops glexec test jobs are seemingly timing out for QM (this is the ticket Daniela mentioned on the ops mailing list). Dan noted that with the cluster half full[1] tests were passing, suggesting some kind of load correlation (but as he also notes - what's getting loaded and causing the problem - Batch, CE or WNs?). Kashif reckons the argus server, and suggests a handy glexec time test which he posted. In progress (2/11)
[1] In this instance I'm not certain the use of "half full" rather then "half empty" is an indicator of an optimistic outlook!
BRUNEL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117324 (2/11)
A fresh looking ROD ticket - Raul had to restart the BDII and hopefully that got it. In progress (2/11)
100IT
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358 (22/9)
Missing Image at 100IT. 100IT have asked for more details, no news since. Waiting for reply (19/10)
Last but by no means least:
THE TIER 1!
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116866 (12/10)
Snoplus pilot enablement (not actually a word) at the Tier 1. New accounts were being requested after some internal discussion. On hold (19/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=116864 (12/10)
CMS AAA tests failing (the submitter notes "again..."). There are some oddities with other sites, which might be remote problems, but Andrew notes that previous manual fixes have been overwritten which likely explains why problems came back. In progress (does it need to be waiting for a reply?) (26/10)
Ronnie, my boy cat, has started snoring. It would be cute if it wasn't so loud.
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117171 (24/10)
LHCB had problems with an arc CE that was misbehaving for everyone. Things were fixed, and this ticket can now be closed. Waiting for reply (can be closed) (27/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117277 (30/10)
Atlas have spotted "bring online timeout has been exceeded). This appears to be a mixture of problems adding up, such as a number of borken disk nodes and heavy write access by atlas. In progress (2/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=117248 (28/10)
I believe related to the discussion on tb-support, this ticket requests that new SRM host certs that meet the requirements specified be requested for the RAL SRMs. Jens was on it, and the new certs are ready to be deployed. In progress (30/10)
And that's all folks!
Cheers,
Matt
|