Print

Print


Hello all!
As the next few weeks are looking weird (Bank Holiday, HEPSYSMAN and I'm 
on leave again!) I thought I'd do a full review this week.

Other VO Nagios
https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15
At time of writing I see problems with test jobs at Brunel for pheno and 
Liverpool for a number of VOs (see Sno+ ticket for probable cause and 
fix at Liverpool).

22 Open UK Tickets this week. Going site-by-site:

APEL/NGI
https://ggus.eu/?mode=ticket_info&ticket_id=113473 (4/5)
Missing accounting date for April for some sites. Raul is discussing 
things for Brunel in the ticket, although they have republished. I think 
it's only ECDF left to republish their April data. In progress (16/5)

OXFORD
https://ggus.eu/?mode=ticket_info&ticket_id=113482 (26/4)
Loss of accounting data for Oxford needing a APEL republish. The Oxford 
guys republished, but there is some confusion with the resulting 
numbers. Discussion is ongoing, John G is currently looking at the 
records. In progress (14/5)

https://ggus.eu/?mode=ticket_info&ticket_id=113650 (11/5)
CMS glideins failing at Oxford. The original problem was with a config 
tweak being left out of the cvmfs setup, but the ticket has been 
reopened citing problems persisting on the ARC CE (the CREAM appears to 
be fixed). Reopened (16/5)

GLASGOW
https://ggus.eu/?mode=ticket_info&ticket_id=113095 (17/4)
ROD ticket about batch system BDII failures, left open to avoid 
unnecessary ticket filing. Gareth noted that the full migration to ARC 
and HTCondor, which should see the end of these issues, will hopefully 
be completed by the end of June. On Hold (12/5)

SHEFFIELD
https://ggus.eu/?mode=ticket_info&ticket_id=113769 (18/5)
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the 
problem(restarted the sssd), just waiting to see if it all pans out. In 
progress (18/5)

MANCHESTER
https://ggus.eu/?mode=ticket_info&ticket_id=113744 (15/5)
For the VOMS rather then the site, Jens' request for the creation of the 
dIrac VO, vo.dirac.ac.uk. In progress (18/5)

https://ggus.eu/?mode=ticket_info&ticket_id=113692 (13/5)
A request from pheno to add support to for their new cvmfs area at 
Manchester, and as I understand it, to support them in a new "form" 
(pheno.egi.eu). In progress (13/5)

LIVERPOOL
https://ggus.eu/?mode=ticket_info&ticket_id=113742 (15/5)
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a 
problem with the DPM BDII service certificate not being updated (that's 
bitten me too), and fixed things this morning. Let's see how that goes. 
In progress (18/5)

LANCASTER
https://ggus.eu/?mode=ticket_info&ticket_id=95299 (1/7/13!)
Lancaster's vintage glexec ticket. An update on this - after have a 
roundtuit session last week I was building glexec for different paths. 
It still needs some testing to make sure it works properly. There 
however definitely won't be a one-size-fits-all tarball solution. On 
hold (15/5)

https://ggus.eu/?mode=ticket_info&ticket_id=100566 (27/1/14)
Only the crustiest old tickets for us at Lancaster! Poor perfsonar 
performance. Sadly didn't get roundtuit on this one - we're pushing 
getting these nodes dual stacked as Ewan had pointed out that it would 
be interesting to see if IPv6 tests also saw this issue. On hild (18/5)

UCL
https://ggus.eu/?mode=ticket_info&ticket_id=113721 (14/5)
The only UCL ticket, this is a egi "low availability" ticket. However 
Daniela notes that the plots are on the rise, so things are looking 
alright. Probably want to "On Hold" it but otherwise not much to be 
done. In progress (14/5)

IMPERIAL
https://ggus.eu/?mode=ticket_info&ticket_id=113743 (15/5)
A ticket from Durham concerning the Dirac instance at Imperial's 
settings for their site. Daniela hopes to get it fixed soon. In progress 
(15/5)

100IT
https://ggus.eu/?mode=ticket_info&ticket_id=112948 (10/4)
CA certificate update at 100IT leading to a discussion of other 
authentication based failures. David has asked for voms information 
after posting his configs. In progress (13/5)

TIER 1
https://ggus.eu/?mode=ticket_info&ticket_id=113035 (14/4)
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think 
things are just about done now, this ticket can soon be closed. In 
progress (11/5)

https://ggus.eu/?mode=ticket_info&ticket_id=109694 (28/10/14)
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 
on their WNs, and notes that there's a lot of active debugging work 
going on in the area. As he eloquently puts it "situation is quite 
fluid". In progress (13/5)

https://ggus.eu/?mode=ticket_info&ticket_id=108944 (1/10/14)
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, 
deploying then trying to get the new xrootd director configured. New 
problems have cropped up, and are under investigation. In progress (11/5)

https://ggus.eu/?mode=ticket_info&ticket_id=112721 (28/3)
Atlas transfer failures ("failed to get source file size").Tracked to a 
odd double transfer error, possibly introduced in one of the recent 
"upgrades". Brian has been declaring these files as bad, and a 
workaround or solution is being thought about. In progress (14/5)

https://ggus.eu/?mode=ticket_info&ticket_id=113705 (13/5)
Atlas transfer failures from RAL tape. Checksum failures, which Brian 
tracked to being due to not being of a type Castor supports. Brian has 
asked if this can be changed at the CERN FTS or in rucio. Waiting for 
reply (14/5)

https://ggus.eu/?mode=ticket_info&ticket_id=113748 (16/5)
Another atlas transfer ticket, but as the error indicates no space left 
at the Brunel space token being transferred to Elena has noted that this 
isn't a site problem, telling the submitter to put in a JIRA ticket 
instead. Waiting for reply, but probably can be just closed (16/5)

https://ggus.eu/?mode=ticket_info&ticket_id=112866 (7/4)
Lots of cms job failures at RAL. This has been traced to some super-hot 
files, mitigation is being looked into. A candidate for perhaps On 
Holding, depends on the time frame of a work around. In progress (13/5)

https://ggus.eu/?mode=ticket_info&ticket_id=113320 (27/4)
CMS data transfer issues. I'm not actually too sure what's going on. 
There are files that need invalidating, which seems to be the root of 
the evil befalling transfers. The issue is being actively worked on 
though. In progress (18/5)

That's all the tickets! Catch y'all tomorrow.

Matt