Hello everybody,
I return (kinda) refreshed from my Easter break, to find that we have a
few more tickets then when I left. As I missed the first Monday of the
month I thought I'd go over (almost) all the tickets today, for the lulz
as the kids say.
24 open UK tickets today, going site by site:
RALPP
https://ggus.eu/?mode=ticket_info&ticket_id=111703 (11/2)
Atlas glexec hammercloud tests failing. There's been a lot of waiting on
atlas to build new HC jobs. The most recent exchange (delayed due to
Easter), was asking about SELinux - but no news since the first. In
progress (1/4)
BIRMINGHAM
https://ggus.eu/?mode=ticket_info&ticket_id=112875 (7/4)
Low availability ROD ticket. Availability is crawling back up, just need
it to go green. On hold (13/4)
GLASGOW
https://ggus.eu/?mode=ticket_info&ticket_id=112967 (10/4)
Another ROD ticket for bdii errors at Glasgow. Gareth has been doing
everything right investigating this. Kashif recommended ticketed the
midmon unit, but Gareth has spotted that the errors correspond to high
load on their ARC CE - so it might be a site problem after all - Gareth
asks for clarification. Waiting for reply (13/4)
EDINBURGH
https://ggus.eu/?mode=ticket_info&ticket_id=95303 (1/7/13)
Tarball glexec ticket. No news (sorry). End if April I believe was the
"deadline" I set for having this made. On Hold (9/3)
LANCASTER
https://ggus.eu/?mode=ticket_info&ticket_id=100566 (27/1/14)
Lancaster's poor perfsonar performance. I'm not believing quite what I
was seeing with the tests I performed so I'm aiming to rerun them. On
hold (13/4)
https://ggus.eu/?mode=ticket_info&ticket_id=95299 (1/7/13)
Lancaster's tarball glexec ticket. Same as ECDF. On hold (9/3)
BRUNEL
https://ggus.eu/?mode=ticket_info&ticket_id=112966 (13/3)
A ROD cream job submit ticket, freshly assigned this afternoon. It's a
bit mean of me to bring notice to it. Assigned (13/4)
100IT
https://ggus.eu/?mode=ticket_info&ticket_id=112948 (10/4)
100IT needed to upgrade to the latest CA release. They've done this, but
there are still authentication problems. In progress (13/4)
https://ggus.eu/?mode=ticket_info&ticket_id=108356 (10/9/14)
Deploying vmcatcher at 100IT. After David's questions falling on deaf
ears for a while it has been advised that the ticket be closed as this
issue will be dealt with elsewhere. Whether or not it is to be "solved"
or "unsolved" is open to debate! In progress (can possibly be closed) (13/4)
TIER 1
https://ggus.eu/?mode=ticket_info&ticket_id=108944 (1/10/14)
CMS AAA tests failing at RAL. After a lot of work and new xrootd
redirectors problems persist. It's looking to be a problem that needs
the CASTOR and/or xrootd devs to look at. In progress (30/3)
https://ggus.eu/?mode=ticket_info&ticket_id=112713 (27/3)
CMS asking to clean up the "unmerged area". Andrew conjured up a list of
files and asked if they could be deleted - CMS responded with a "yes
please then close the ticket". Has the deed been done? In progress (31/3)
https://ggus.eu/?mode=ticket_info&ticket_id=109694 (28/10/14)
The Sno+ gfal copy ticket. Matt M still sees gfal-copy hang for files at
RAL when he uses the GUID (SURL works). A Castor oddity perhaps? Matt
asks a question about what problems like this (coupled with the move
away from lcg tools) will mean for VOs that rely on the LFC. In progress
(31/3)
https://ggus.eu/?mode=ticket_info&ticket_id=112977 (10/3)
CMS high job failure rate at RAL. Related to 112896 (below) - the jobs
all want that file! In progress (13/3)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112896 (9/4)
CMS Dataset access problems - caused by over a million access attempts
on a single file over a 18 hour period. Andrew L comments that CMS needs
to have a think about how they access pileup datasets. In progress (9/4)
https://ggus.eu/?mode=ticket_info&ticket_id=111699 (10/2)
Tier 1 counterpart to 111703. A new HC stress test was submitted near
the end of March, but no news on how it did. In progress (23/3)
https://ggus.eu/?mode=ticket_info&ticket_id=112866 (7/4)
A different "lots of CMS job failures" ticket. Again a "hot file" seems
to be the root cause. In progress (7/4)
https://ggus.eu/?mode=ticket_info&ticket_id=112721 (28/3)
An atlas file access ticket, seemingly caused by some odd FTS behaviour.
No answers to Shaun's question about this odd occurrence or much noise
at all till today. Waiting for reply (13/4)
UCL
has 6 tickets - 4 just "assigned". I'll just list them in the interests
of brevity (and I'm running out of time - sorry!).
https://ggus.eu/?mode=ticket_info&ticket_id=112371 (ROD low
availiability, On Hold)
https://ggus.eu/?mode=ticket_info&ticket_id=112841 (atlas 0% transfer
efficiency, assigned)
https://ggus.eu/?mode=ticket_info&ticket_id=112873 (ROD srm put
failures, assigned)
https://ggus.eu/?mode=ticket_info&ticket_id=95298 (glexec ticket)
https://ggus.eu/?mode=ticket_info&ticket_id=112722 (atlas checksum
timeouts, in progress)
https://ggus.eu/?mode=ticket_info&ticket_id=112966 (ROD job submit
failures, assigned)
That's all folks!
Matt
|