Hello Everybody,
Jeremy asked that I bring the usual monthly review forward a notch, so
here are all the UK tickets in their full glory.
28 Open UK Tickets
SUSSEX
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105937 (2/6)
Low availability ticket, due to EMI3 upgrade woes. Most issues have been
solved, but Apel publishing problems have been rolled into the ticket.
Matt RB seems digging his way out in the right direction though. In
progress (30/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105618 (21/5)
Sno+ CVMFS unavailable at Sussex. On Hold whilst the other issues are
dealt with. On Hold (23/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106492 (25/6)
A request from atlas to resize Space Tokens. Matt also asked if
atlashostdisk and atlasgroupdisk could be deleted - Brian gave the nod
yes. Probably all done with here? In Progress (27/6)
BRISTOL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106438 (23/6)
CMS having some trouble running jobs at Bristol (especially having lots
of "held" jobs- but reading the ticket this means held on the cms queue,
not in the local batch system). Winnie notes that for at least one of
their queues they have over a hundred waiting cms jobs on a 72 slot
shared queue. But it looks like the problem may have evapourated. At
last word the cms submitter said he'd close the ticket if things stayed
clear - but this was last Thursday. In Progress (26/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106325 (1/6)
A different CMS ticket, about pilot jobs losing connection to their
submission hosts. After another round of nomenclature confusion, it was
found that the problem seems to be between Bristol and hosts
cmssrv119.fnal.gov and vocms97.cern.ch. Lukasz suggests using perfsonar
to investigate. Also the dates on this ticket are well off (creation
date 1/6, but first update 18/6) In progress (27/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106554 (1/6)
Again the dates on this ticket are very off (creation date was the 1/6,
but the first update is the 29/6)- so the issue may have disappeared.
This is another cms ticket about a heavy transfer backlog between
Bristol and FNAL - if it's still a problem possibly linked to the above
issue. Waiting on Lukasz to get back. In progress (30/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106058 (9/6)
CMS xrootd problems at Bristol. Also waiting on Lukasz's return (which I
think has happened). On Hold (16/6)
EDINBURGH
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95303 (1/7/2013)
glexec ticket. No news, the early review meant I couldn't sooth my shame
on this matter. On Hold (27/1)
MANCHESTER
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105922 (2/6)
Manchester publishing to EMI2 APEL. It's being worked on, but one piece
is missing - on hold until this detail is sorted. On Hold (25/6)
LANCASTER
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106406 (23/6)
LHCB having trouble on Lancaster's older cluster. First issue was cvmfs
timeouts - linked to older WNs being overloaded. Second issue is cream
CE losing track of jobs in the batch system. Being worked on, but like a
case of old age-tuning can only fix so much. In progress (26/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95299 (1/7/2013)
glexec ticket. As with ECDF. On Hold (4/4)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=100566 (27/1)
Persistant Poor Perfsonar Performance Problems Plaguing Plymouth-born
Postdoc... nope, that's as many Ps as I can get (and I'm not sure I
still count as a Postdoc). A reinstall of the box hasn't helped. If
anyone has a normal 10G iperf endpoint I could test against that would
be great. Other then that waiting on some networking rejigging at
Lancaster to shake things up and give the network engineers another
chance to go over things. On Hold (23/6)
UCL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106425 (23/6)
UCL failing ops tests that are using their SE. Ben noticed a problem
with one of their pools, but fixing it didn't seem to solve the problem.
Gareth has asked for an update pending being forced to escalate. In
progress (30/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=95298 (1/7/2013)
UCL's glexec ticket. Last word was this would be the first job of a
newer staff member, who was due to start within a few months (so about
nowish?). On Hold (16/4)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=101285 (16/2)
UCL's perfsonar not working after suffering a hardware failure. Bits
have been replaced and the machine was due a reinstall a while ago. On
Hold (28/4)
RHUL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106437 (23/6)
Atlas have inaccessible file(s) at RHUL due to a pool node in distress.
Govind hopes to install a new motherboard tomorrow and will update
after. Good luck with the repair! In progress (30/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105943 (2/6)
Biomed asking for gsiftp access on the RHUL headnode so that they can
read the namespace with gsiftp. Govind tried to enable this but biomed
report that it didn't work. Not much word since - but I expect Govind's
been busy. In progress (23/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105923 (2/6)
RHUL still publishing to EMI2 APEL too. On Govind's to do list, but low
priority. No word for a while. On Hold (17/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106495 (25/6)
Inconsistent storage capacity publishing at RHUL. Govind reckons (quite
rightly) that this is due to having a pool node out of commission and
will look at it once that's fixed. In Progress (26/5)
QMUL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105771 (27/5)
Biomed having problems accessing files via https at QM. Chris explains
that they've had to switch off https access and are waiting for 105361
to be fixed and storm to be updated. On Hold (12/6)
IMPERIAL
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106369 (20/6)
Biomed ticket, similar to 105943 for RHUL, but with some added history
(106369). Biomed are being a little insistent, and asked a question that
I don't fully understand about path publishing. In Progress (30/6)
IMPERIAL CLOUD
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106347 (19/6)
The new cloud site needed to tune things as VMs weren't using proxies
but hitting the cern statum 0 directly. Adam is working on how to get
around this - Ewan has mentioned that Oxford have shoal running and have
seen accesses from the Imperial Cloud machines - so the problem may have
a no work required workaround (the best kind!). In Progress (29/6)
EFDA-JET
https://ggus.eu/index.php?mode=ticket_info&ticket_id=97485 (21/9/2013)
LHCB jobs having openssl like problems at Jet. No progress on this for a
while but none was expected - the problem survived the move to EMI3, and
the jet admins are stuck. On Hold (12/5)
TIER 1
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105405 (14/5)
Vidyo router firewall ticket. I suspect this ticket can be closed, as
other issues are being followed up elsewhere- or it at least needs an
update/being ste on hold. In Progress (10/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105571 (20/5)
Inconsistent BDII and SRM storage numbers for lhcb. This has been worked
on, and seems almost fixed. There's some debate over the tape figures,
Brian points out that the 'online' values are correct. In progress (30/6)
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106324 (18/6)
CMS pilots losing connection to their submission hosts at RAL. It looks
like this has been going on silently for a while, the RAL team are
taking it up with their networking chaps to see if it's a firewall issue.
https://ggus.eu/index.php?mode=ticket_info&ticket_id=106480 (25/6)
The information publishing police have pointed out that the RAL Castor
isn't publishing a sane version. Brian suspects an rogue ":" causing the
problems.
That's it from me!
Cheers,
Matt
|