Hello everybody,
Sadly I don't have a Bonfire night themed rhyme for you all this year,
partly as it felt a bit lame doing it a few days early. Also I couldn't
think of any more good words that rhyme with tickets. I'm no poet, and
don't you all know it.
Looking at the VO Nagios it looks like almost exactly the same picture
last week. I'll leave going over it this week to save some time.
26 Open UK tickets today.
Sussex
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109539 (22/10)
Sussex publishing "all the 4s" (bdii bingo!) for their waiting jobs.
Matt RB has a ticket in with the developers over these problems
(109263), although he has bravely said that he might try to tackle the
problem himself...and it looks like lcg-infosites returns a sensible
number now. On Hold (can be closed?) (23/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108765 (24/9)
Cross-referenced with the above ticket, looking at the last few updates
it looks like Matt RB release a spooky Hallow'een patch, and now they
look to be green. Another ticket that can be closed? On hold (31/10)
Bristol
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (18/6)
CMS pilots losing connection at Bristol. No news for a while, it looks
to me like Bristol are still in downtime though? This has been a tough
issue to debug. On hold (14/10)
Glasgow
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109807 (1/11)
Someone at atlas were trying to raise the dead at Glasgow over
Hallow'een, although rather then zombies it was long lost files. It
appears that despite these files being declared lost last summer the
deletion/recovery ritual hadn't been completed. UK cloud support are on
the case. In progress (3/11)
Edinburgh
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/13)
glexec ticket. On Hold (29/8)
Durham
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108273 (5/9)
Durham's perfsonar results going "proper weird" suddenly. The local
networking team where on the case, but the perfsonar got offlined from
fear of shellshock and there has been no news since (is it alright to
reinstall perfsonar yet?). On hold (6/10)
Sheffield
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109207 (8/10)
Sno+ asking for their VO_SW_DIR to point to cvmfs. Elena rolled this
out, but sadly the ticket was reopened due to some job failures
accessing cvmfs, and a few holdouts still with the wrong environment
variable (Matt M threw in some CE errors he was seeing too, but he was
very apologetic about it). Elena's investigating. In progress (30/10)
Manchester
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109272 (11/10)
Atlas have been seeing transfer problems, although it looks like these
failures have mutated since the ticket was opened (checksum errors to
srm type errors by the looks of it). Alessandra is on the case. In
progress (3/11)
Lancaster
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108715 (23/9)
Getting Sno+ jobs running at Lancaster. It looks like everything is in
place, just waiting for Sno+ to confirm (or give us a list of errors!).
Waiting for reply (30/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/13)
tarball glexec ticket... no news other then my last attempt a few weeks
ago failed (not as simple as I hoped) On hold (8/9)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)
Poor Perfsonar Performance. Has hit a bit of a roadblock with both
perfsonar boxes being switched off for the last month... have I missed
an announcement saying that the latest perfsonar release is ready? On
hold (31/10)
UCL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/13)
UCL's glexec ticket. Ben hit a snag installing this mid-October, no news
since then after some feedback from Maarten. In progress (14/10)
Imperial
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109526 (22/10)
LHCB having cvmfs trouble at IC, which was likely caused by a batch of
naughty CMS jobs ruining it for everyone else. LHCB re-enabled IC to see
if things were back on track, no news since. Waiting for reply (24/10)
EFDA-JET
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109571 (23/10)
Ops "availability" test failures at Jet. The cause of the alarms is
known (Jet had a certificate problem on a few hosts). Just waiting for
alarm to clear now. On Hold (28/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/13)
The case of the mysterious lhcb failures at Jet. No progress, none
expected really though. On hold (1/10)
100IT
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108356 (10/9)
AFAICS this ticket now distills down to "Getting vmcatcher working at
100IT". Things seem to be progressing well, although the 100IT chaps
aren't very good at setting their ticket statuses correctly! In progress
(28/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109573 (23/10)
Ticket listing the requirements for a cloud site. All the three actions
have or already were completed, but there is a question over the state
of the 100IT site BDII. In progress (30/10)
La Grada Uno
(That's the Tier 1, at least according to my rusty GCSE in Spanish. By
this point of the review my brain is mush.)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109712 (29/10)
CMs are seeing glexec errors ("status 203") at the Tier 1. Looks to be
caused by a lack of wildcard mapping, only just coming to light with the
recent cms analysis jobs coming into the site. Andrew L is on it like a
scotch bonnet. Or just on it. (29/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109694 (28/10)
Matt M from Sno+ has noticed gfal-copy errors when trying to access the
Tier 1 using those tools. He's not sure if this is a problem with the
Tier 1 or the tools themselves (or even his setup), Duncan is already
helping him out. In progress (3/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=107880 (26/8)
(possibly related) Sno+ "srmcp failures" for a bunch of SUSY users. Some
great input on how to get the tools working from Duncan and Chris, but
no word since. My suspicion is Matt is waiting to hear back from this
user group. Maybe their mail clients don't work under SUSE either? In
progress (21/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324 (18/6)
The Tier 1 version of the Bristol CMS pilots losing connection ticket.
On hold after exhausting all ideas. On hold (13/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109276 (11/10)
Submissions to the RAL FTS3 "REST" interface failing for some reason -
AIUI thought to be a problem with the CRLs and apache. After some advice
the system has been tweaked, and is in the waiting-to-see-if-
that-fixed-it stage. On hold (3/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108944 (1/10)
CMS AAA access tests failing at RAL. Reading down the ticket it looks to
be a cms redirector problem at RAL... or something... Andrew has been
working to fix things, adding another redirector and other tweaks.
Andrew has asked the xrootd experts (cc'd?) why the behaviour they are
seeing is occurring (and also notes some references to RALPP slipping
into the Tier 1 discussion). Waiting for reply (27/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109608 (24/10)
T2K notice the LFC denying the existence of the new user. The problem
seem to go away from the T2K side, but Catalin has spotted a potential
problem and asked for some voms-proxy-info output. Waiting for reply (28/10)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109814 (3/11)
Atlas have noticed a lot of lost job heartbeats over the last day, the
Tier One guys are on it. In progress (3/11)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=107935 (27/8)
Inconsistent BDII/SRM numbers. Looks to be a problem with how castor
reports read-only disk servers, Brian has put in a request to the Castor
team for information on this. On hold (3/11)
And we're done! Thanks for staying with me this far. Now to copy, paste
and reformat all this for the wiki. *sigh*
Cheers all!
Matt
|