Hello all,
With everyone away on holiday (or at least wishing they were) I almost
didn't do this month's review of all the tickets. But everything else on
my to-do list is hard, except maybe watering the office plants. And even
then with my hand-eye coordination there's a good chance I'll moisten my
laptop doing that...
Onto the tickets, 24 of them open at the time of writing. In no real
order (at least not one that makes sense):
Sno+ "glite-wms-job-status warning" (3/8)
Glasgow: https://ggus.eu/?mode=ticket_info&ticket_id=115435
Tier 1: https://ggus.eu/?mode=ticket_info&ticket_id=115434
Matt M submitted these tickets to Glasgow and the Tier 1 after having
trouble with a proportion of Sno+ jobs. Both are being looked at-
definitely worth collaborating on this one.
Wiki-leak...
https://ggus.eu/?mode=ticket_info&ticket_id=115399 (31/7)
Jeremy noticed that the wiki didn't work for him on Friday - but it
seems to work for Jeremy, Alessandra and myself now. As Jeremy notes the
ticket can be closed, but out of interest did anyone else spot any
problems? In progress (3/8)
Spare the ROD...
LIVERPOOL
https://ggus.eu/?mode=ticket_info&ticket_id=115433 (3/8)
Some CE problems noticed on the dashboard for the Liver-lads - who might
be in mourning. Assigned (3/8)
RALPP & UCL
https://ggus.eu/?mode=ticket_info&ticket_id=114764
https://ggus.eu/?mode=ticket_info&ticket_id=114851
Both of these are "availability" alarm tickets, on-holding until they
clear. I hope RALPP managed to get a re-computation for their unfair
failures (IGTF-1.65 problems on ARC).
Sno+'d Under
https://ggus.eu/?mode=ticket_info&ticket_id=115387 (30/7)
I'm uncertain if this Sno+ ticket, probbaly somewhat related to Matt M's
recent thread on TB-SUPPORT and concerning xrootd access, is meant for
the Tier 1 or RALPP. Assigned (3/8)
First Tier Problems.
https://ggus.eu/?mode=ticket_info&ticket_id=115417 (2/8)
LHCB spotted a number of nodes with cvmfs problems at the Tier 1, which
the RAL team had already jumped on and repaired this morning. They
wonder if the problem persists. Waiting for reply (3/8)
https://ggus.eu/?mode=ticket_info&ticket_id=115290 (28/7)
An FTS problem requiring some special CA magic to solve, but the current
CA-wizard isn't about. On hold (29/7)
https://ggus.eu/?mode=ticket_info&ticket_id=113836 (20/5)
Glue 1 vs Glue 2 queue mismatches. It's being worked on perfecting
cluster publishing for ARC CEs, but the ticket could either do with an
update or on-holding. In progress (24/6)
https://ggus.eu/?mode=ticket_info&ticket_id=114992 (10/7)
CMS transfers failing between RAL and, err, TAMU in the US. Assigned to
RAL, where Brian has investigating and Andrew has posed an good
question, asking if the user has considered managing the transfers with
FTS. Quiet on the user side. In progress (21/7)
https://ggus.eu/?mode=ticket_info&ticket_id=108944 (1/10/2014)
One of the tickets from the before times, about CMS AA access tests. It
has become a long and confusing saga, but Gareth rescued it with a handy
summary of the issue in his last update. How goes the battle? In
progress (17/7)
There's a really big wasp stuck in my office window now. I need to chase
it out. Be right back...
...Well that was a depressing reminder of how short I am. Luckily the
wasp seems to be finding her own way out.
Oxford Squid is red.
https://ggus.eu/?mode=ticket_info&ticket_id=115230 (24/7)
Which might be the colour Ewan's seeing right now! The ticket is
reopened, with a comment from Alessandra that the current recommendation
is to allow all CERN addresses, and asks if this is something Oxford
could do. Reopened (3/8)
Mavaricks and Gooses - GridPP Pilot roles.
https://ggus.eu/?mode=ticket_info&ticket_id=114485 - Bristol
https://ggus.eu/?mode=ticket_info&ticket_id=114460 - Sheffield
https://ggus.eu/?mode=ticket_info&ticket_id=114442 - RALPP
https://ggus.eu/?mode=ticket_info&ticket_id=114441 - RHUL
Daniela hopped right back into the pilot seat after getting back from
her holidays. Bristol and RALPP are looking good, Sheffield and RHUL are
still in the Danger Zone - RHUL in particular were having troubles with
argus and could do with some working configs from elsewhere to compare
and contrast with their own.
My shame
https://ggus.eu/?mode=ticket_info&ticket_id=95303 - ECDF
https://ggus.eu/?mode=ticket_info&ticket_id=95299 - Lancaster
The tarball glexec tickets. Actually this is likely to become a defunct
(or at least different) problem at Edinburgh with their SL7 move.
Lancaster has a plan - we plan to deploy *something* (amazing plan there
Matt) during our next big reinstall in September. Between now and then I
have a test CE, cluster and most importantly some time.
Pot-luck tickets (or those I couldn't group).
Durham
https://ggus.eu/?mode=ticket_info&ticket_id=114381 (16/6)
A tiny fraction of jobs publishing 0 cores used. Looks to be a slurm
oddity. Oliver upgraded their CEs to ARC5 last week and hopes this has
fixed things. Fingers crossed! In progress (29/7)
Lancaster
https://ggus.eu/?mode=ticket_info&ticket_id=100566 (27/1/2014)
My other shame - Lancaster's poor perfsonar performance. It's being
worked on. On Hold (should be back in progress soon) (30/7)
Liverpool
https://ggus.eu/?mode=ticket_info&ticket_id=114248 (21/7)
Sno+ production problems at Liverpool, probably due to a lack of space
in the shared area. Things are back in Sno+'s court, with the submitter
consulting the Sno+ gurus (I think). In progress (21/7)
QMUL
https://ggus.eu/?mode=ticket_info&ticket_id=114573 (23/6)
LHCB job submission problems due to the known about dual-stacking
problems. Waiting for input from LHCB for a while now, as things look
okay at QM now but at least check LHCB jobs still weren't running for
some reason. Waiting for reply (21/7)
And Daniela solved the 24th ticket, so that should be all of them done!
In hindsight I don't think this group-by-subject format for the monthly
full review worked, back to the usual by-site next month. I'm not
re-formatting this update though!
Thanks all (for your patience!),
Matt
|