Print

Print


Hello everybody,
Sadly I don't have a Bonfire night themed rhyme for you all this year, 
partly as it felt a bit lame doing it a few days early. Also I couldn't 
think of any more good words that rhyme with tickets. I'm no poet, and 
don't you all know it.

Looking at the VO Nagios it looks like almost exactly the same picture 
last week. I'll leave going over it this week to save some time.

26 Open UK tickets today.

Sussex
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109539 (22/10)
Sussex publishing "all the 4s" (bdii bingo!) for their waiting jobs. 
Matt RB has a ticket in with the developers over these problems 
(109263), although he has bravely said that he might try to tackle the 
problem himself...and it looks like lcg-infosites returns a sensible 
number now. On Hold (can be closed?) (23/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=108765 (24/9)
Cross-referenced with the above ticket, looking at the last few updates 
it looks like Matt RB release a spooky Hallow'een patch, and now they 
look to be green. Another ticket that can be closed? On hold (31/10)

Bristol
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (18/6)
CMS pilots losing connection at Bristol. No news for a while, it looks 
to me like Bristol are still in downtime though? This has been a tough 
issue to debug. On hold (14/10)

Glasgow
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109807 (1/11)
Someone at atlas were trying to raise the dead at Glasgow over 
Hallow'een, although rather then zombies it was long lost files. It 
appears that despite these files being declared lost last summer the 
deletion/recovery ritual hadn't been completed. UK cloud support are on 
the case. In progress (3/11)

Edinburgh
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/13)
glexec ticket. On Hold (29/8)

Durham
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108273 (5/9)
Durham's perfsonar results going "proper weird" suddenly. The local 
networking team where on the case, but the perfsonar got offlined from 
fear of shellshock and there has been no news since (is it alright to 
reinstall perfsonar yet?). On hold (6/10)

Sheffield
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109207 (8/10)
Sno+ asking for their VO_SW_DIR to point to cvmfs. Elena rolled this 
out, but sadly the ticket was reopened due to some job failures 
accessing cvmfs, and a few holdouts still with the wrong environment 
variable (Matt M threw in some CE errors he was seeing too, but he was 
very apologetic about it). Elena's investigating. In progress (30/10)

Manchester
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109272 (11/10)
Atlas have been seeing transfer problems, although it looks like these 
failures have mutated since the ticket was opened (checksum errors to 
srm type errors by the looks of it). Alessandra is on the case. In 
progress (3/11)

Lancaster
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108715 (23/9)
Getting Sno+ jobs running at Lancaster. It looks like everything is in 
place, just waiting for Sno+ to confirm (or give us a list of errors!). 
Waiting for reply (30/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/13)
tarball glexec ticket... no news other then my last attempt a few weeks 
ago failed (not as simple as I hoped) On hold (8/9)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)
Poor Perfsonar Performance. Has hit a bit of a roadblock with both 
perfsonar boxes being switched off for the last month... have I missed 
an announcement saying that the latest perfsonar release is ready? On 
hold (31/10)

UCL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/13)
UCL's glexec ticket. Ben hit a snag installing this mid-October, no news 
since then after some feedback from Maarten. In progress (14/10)

Imperial
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109526 (22/10)
LHCB having cvmfs trouble at IC, which was likely caused by a batch of 
naughty CMS jobs ruining it for everyone else. LHCB re-enabled IC to see 
if things were back on track, no news since. Waiting for reply (24/10)

EFDA-JET
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109571 (23/10)
Ops "availability" test failures at Jet. The cause of the alarms is 
known (Jet had a certificate problem on a few hosts). Just waiting for 
alarm to clear now. On Hold (28/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/13)
The case of the mysterious lhcb failures at Jet. No progress, none 
expected really though. On hold (1/10)

100IT
https://ggus.eu/index.php?mode=ticket_info&ticket_id=108356 (10/9)
AFAICS this ticket now distills down to "Getting vmcatcher working at 
100IT". Things seem to be progressing well, although the 100IT chaps 
aren't very good at setting their ticket statuses correctly! In progress 
(28/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109573 (23/10)
Ticket listing the requirements for a cloud site. All the three actions 
have or already were completed, but there is a question over the state 
of the 100IT site BDII. In progress (30/10)

La Grada Uno
(That's the Tier 1, at least according to my rusty GCSE in Spanish. By 
this point of the review my brain is mush.)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109712 (29/10)
CMs are seeing glexec errors ("status 203") at the Tier 1. Looks to be 
caused by a lack of wildcard mapping, only just coming to light with the 
recent cms analysis jobs coming into the site. Andrew L is on it like a 
scotch bonnet. Or just on it. (29/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109694 (28/10)
Matt M from Sno+ has noticed gfal-copy errors when trying to access the 
Tier 1 using those tools. He's not sure if this is a problem with the 
Tier 1 or the tools themselves (or even his setup), Duncan is already 
helping him out. In progress (3/11)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=107880 (26/8)
(possibly related) Sno+ "srmcp failures" for a bunch of SUSY users. Some 
great input on how to get the tools working from Duncan and Chris, but 
no word since. My suspicion is Matt is waiting to hear back from this 
user group. Maybe their mail clients don't work under SUSE either? In 
progress (21/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324 (18/6)
The Tier 1 version of the Bristol CMS pilots losing connection ticket. 
On hold after exhausting all ideas. On hold (13/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109276 (11/10)
Submissions to the RAL FTS3 "REST" interface failing for some reason - 
AIUI thought to be a problem with the CRLs and apache. After some advice 
the system has been tweaked, and is in the waiting-to-see-if- 
that-fixed-it stage. On hold (3/11)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=108944 (1/10)
CMS AAA access tests failing at RAL. Reading down the ticket it looks to 
be a cms redirector problem at RAL... or something... Andrew has been 
working to fix things, adding another redirector and other tweaks. 
Andrew has asked the xrootd experts (cc'd?) why the behaviour they are 
seeing is occurring (and also notes some references to RALPP slipping 
into the Tier 1 discussion). Waiting for reply (27/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109608 (24/10)
T2K notice the LFC denying the existence of the new user. The problem 
seem to go away from the T2K side, but Catalin has spotted a potential 
problem and asked for some voms-proxy-info output. Waiting for reply (28/10)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=109814 (3/11)
Atlas have noticed a lot of lost job heartbeats over the last day, the 
Tier One guys are on it. In progress (3/11)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=107935 (27/8)
Inconsistent BDII/SRM numbers. Looks to be a problem with how castor 
reports read-only disk servers, Brian has put in a request to the Castor 
team for information on this. On hold (3/11)

And we're done! Thanks for staying with me this far. Now to copy, paste 
and reformat all this for the wiki. *sigh*

Cheers all!
Matt