Remember remember, the 5th of November.
GGUS tickets and Plot;
I know of no pretext
Why your GGUS tickets,
Should ever be forgot!
It's the first Monday of the Month, so I get to read all of your tickets
- all 46 of them. Which I have less time to do after spending far too
long trying to find something that (vaguely) rhymes with tickets that
wasn't crickets.
(I realised too late that tomorrow's full ticket review is cancelled,
but as I had written the thing anyway I might as well mail it out!).
Almost a ticket:
In an attempt to stop tickets before they happen (like a very unexciting
pre-crime unit), here's a further reminder to sites that haven't updated
their machines to include the GridPP BackUp Voms servers for the
relevant VOs.
The current list of sites that haven't updated according to:
https://www.gridpp.ac.uk/wiki/Adoption_of_Backup_GridPP_Voms_Servers#Intermediate_Voms_Server_Records
are:
EFDA-JET, BRUNEL, UCL, SHEFFIELD, DURHAM, ECDF, RALPP, SUSSEX and an
unknown fraction of GLASGOW.
Oh, too late, Chris has started filing the tickets. A failure for the
Pre-Ticket Unit:
https://ggus.eu/ws/ticket_info.php?ticket=98614
On to the regular tickets:
TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=98469 (29/10)
Gareth submitted a ticket to note the decommissioning of a bunch of RAL
CEs tomorrow (listed in the ticket). On hold (29/10)
https://ggus.eu/ws/ticket_info.php?ticket=98249 (21/10)
SNO+ asking for cvmfs access at the RAL stratum-0. Waiting on the
Stratum-1 to be upgraded to cvmfs v2.1 (which is a boat all the new
cvmfs repos will be in). In Progress (30/10)
https://ggus.eu/ws/ticket_info.php?ticket=97385 (17/9)
The ticket tracking the HyperK cvmfs repo deployment. Presumably
affected by the above issue, as well as a logistical one on figuring out
how to get software in there. JK asks if this should be put "On Hold"
whilst these things are figured out. Jeremy has asked if the issues need
to be split as well as other questions. In Progress (28/10)
https://ggus.eu/ws/ticket_info.php?ticket=97868 (8/10)
T2K's cvmfs I-want-a-repo ticket. Hit the same cvmfs version problem as
the previous two, but is also waiting on feedback from the VO itself
since the 21/10. Waiting for reply (30/10)
https://ggus.eu/ws/ticket_info.php?ticket=98122 (17/10)
cern@school getting in on the cvmfs action. Being worked on, has the
same issue as the other statum-0 tickets. In Progress (30/10)
https://ggus.eu/ws/ticket_info.php?ticket=98607 (4/11)
Atlas noticed some Castor access problems ("Too many threads busy"),
which Alastair notes are probably due to some over-eager deletion tasks
he was running. Alastair has paused his deletions, and will resume them
at a slower rate once things are working again. In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=97759 (4/10)
The Tier-1's "SHA2" ticket. I believe that these CEs are being
decommissioned tomorrow (98469) so hopefully this issue will resolve
itself. Worth keeping an eye on. On hold (4/10)
https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)
Request from Chris for webdav support on the RAL LFC. No news on this
since August, it needs an update. On Hold (9/8)
https://ggus.eu/ws/ticket_info.php?ticket=98337 (23/10)
MICE were experiencing slow uploads to Castor. This one fell through the
cracks for a few days, some questions back to MICE have been asked
(partly to see if the problems are still there). Waiting for reply (30/10)
https://ggus.eu/ws/ticket_info.php?ticket=97025 (3/9)
The outstanding issue with the old RAL myproxy server's hostname not
being in its certificate. A newer machine doesn't have this problem, but
hasn't been declared production ready yet (looking in the gocdb at
myproxy.gridpp.rl.ac.uk. Any news? On hold (12/9)
https://ggus.eu/ws/ticket_info.php?ticket=98214 (19/10)
CMS noticed Hammercloud failures at RAL. The problem disappeared, so
this ticket can be closed - it looks like CMS have left that up to the
RAL chaps. In progress (can be solved) (21/10)
https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9/2012)
"correlated packet-loss on perfsonar host". An update from Brian says
that there's a planned reinstall of the latency host on new hardware to
rule out endpoint troubles. On hold (18/10)
SUSSEX
https://ggus.eu/ws/ticket_info.php?ticket=95165 (28/6)
Duncan asked Sussex to check their perfsonar box. Emyr replied that
there was a plan to reinstall it. No news for a while. On hold (14/8)
https://ggus.eu/ws/ticket_info.php?ticket=98172 (18/10)
A SHA2 test ticket for Sussex's Storm SE. Planning the upgrade to a
version that passes muster. In progress (28/10)
RALPP
https://ggus.eu/ws/ticket_info.php?ticket=98544 (1/11)
RALPP were their SL6 upgrade plans by Alessandra since the SL6 deadline
passed. Chris posted a comprehensive reply. On hold (1/11)
https://ggus.eu/ws/ticket_info.php?ticket=97834 (7/10)
A SHA2 ticket for RALPP's dcache SE. An upgrade is planned, but keeps
getting pushed back (due to stuff happening). Current planned date is
12/11. On hold (1/11)
CAMBRIDGE
https://ggus.eu/ws/ticket_info.php?ticket=98597 (4/11)
Cambridge got an APEL-Pub (a good name for a GridPP bar?) nagios test
failure. John is waiting on a open ticket he has with the APEL team
during his transition to EMI3 apel (97957). In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=95306 (1/7)
glExec ticket. John has almost vanquished this, test failures look to
not be related to gLexec at all. Just waiting to pass enough tests to
declare the issue closed.
BRISTOL
https://ggus.eu/ws/ticket_info.php?ticket=98543 (1/11)
Bristol's SL6 WN migration plan ticket. A swift reply from Lukasz (with
a savannah link) says they'll be half way there soon. In progress (1/11)
https://ggus.eu/ws/ticket_info.php?ticket=96261 (30/7)
A cms user's problems with their jobs failing during stage out. It
should be fixed now (or at least a new issue should show up!), but
getting word back from the user is difficult. Personally I'd just solve
it and they can reopen if it's still broken. Waiting for reply (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=95305 (1/7)
Bristol's Glexec ticket. All the SL6 WNs (behind lcgce01) are gleXeced,
so tied into moving the remaining Sl5 nodes to SL6. On hold (23/10)
GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=96234 (29/7)
Request to enable HyperK on the Glasgow WMS. Gareth has enabled them on
the WMS and in argus, and asked for some testing. Waiting for reply (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=98253 (21/10)
CMS have spotted jobs failing due to full WNs, related to another user
filling up the disk space (Biomed, 98239). The ticket then snowballed
to include problems with the CMS environment after the move to SL6, and
a move over to xroot for cms jobs. The documentation linked is handily
hidden from all who are not cms. Gareth asks if anyone with CMS
credentials please forward him the info linked in
https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWIntTrivial Waiting for
reply (1/11)
https://ggus.eu/ws/ticket_info.php?ticket=97068 (5/9)
Glasgow's perfsonar wasn't being right. The plan is to reinstall the
box, but after the SL6 upgrade is done and the dust settled from that.
On hold (15/10)
ECDF
https://ggus.eu/ws/ticket_info.php?ticket=96002 (22/7)
SHA2 ticket for Edinburgh. The ticket could do with an update (so could
your CE :-P). Ribbing aside, I believe the offending CE will be switched
off soon now that the SL6 upgrade is passed. On hold (20/8)
https://ggus.eu/ws/ticket_info.php?ticket=95303 (1/7)
ECDF's glexeC ticket. As a tarball site, it's all on me. On hold (21/8)
DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=98610 (4/11)
Nagios tests failing at Durham. Ewan reports a poorly site BDII, keeping
a stern eye on it. In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=98585 (4/11)
Atlas having troubles accessing files at Durham. Acknowledged, but no
other news yet. In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=95302 (1/7)
Durham's gLExec ticket. There were some teething problems, but they look
to be fixed. Any more news? On hold (21/10).
SHEFFIELD
https://ggus.eu/ws/ticket_info.php?ticket=98594 (4/11)
LHCB having problem uploading their job outputs from Sheffield. Looks to
be a local network problem? In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=95301 (1/7)
Sheffield's glexEc ticket. Disk server problems have pushed glexec
configuration work back. On hold (29/10)
https://ggus.eu/ws/ticket_info.php?ticket=97039 (4/9)
Biomed complaining about lack of dynamic publishing at Sheffield. Due to
having bigger fish to fry Elena has had to put this on the back burner.
On hold (21/10)
MANCHESTER
https://ggus.eu/ws/ticket_info.php?ticket=97066 (5/9)
Dodgey perfsonar at Manchester. Set on hold until after SL6 and the
Manchester network has finished playing up. Are you going to start soon?
On hold (9/9)
LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=95299 (1/7)
GLeXEC ticket. Now that I have almost nothing left to upgrade I'm
working on this, as well as a bunch of other tarball related requests.
On hold (17/7)
https://ggus.eu/ws/ticket_info.php?ticket=98403 (25/10)
LHCB having trouble on the "upgraded" Lancaster clusters. Working on it
with LHCB. In progress (4/11)
UCL
https://ggus.eu/ws/ticket_info.php?ticket=95298 (1/7)
glexec ticket. Planned to be done after SL6, comparatively low
priority. On hold (14/10)
https://ggus.eu/ws/ticket_info.php?ticket=98125 (17/10)
Atlas transfer problems to/from UCL. Site blacklisted a lot,
"globus_xio: System error in connect: Connection refused globus_xio: A
system call failed: Connection refused" errors. In progress (1/11)
https://ggus.eu/ws/ticket_info.php?ticket=98542 (1/11)
SL6 WN migration plan ticket. No reply from the site yet. Assigned (1/11)
RHUL
https://ggus.eu/ws/ticket_info.php?ticket=95297 (1/7)
RHUL's gLeXeC ticket. Govind got it working for Ops, but not the "big
three". Almost! Reopened (30/10)
QMUL
https://ggus.eu/ws/ticket_info.php?ticket=98592 (4/11)
cmtsite timeout failures for some atlas jobs. Dan thinks he tracked it
down to some bad user jobs hammering the storage, and initiated some
containment procedures. Hopefully this will have got it. In progress (4/11)
https://ggus.eu/ws/ticket_info.php?ticket=98376 (24/10)
Sno+ question about queue attributes at QM which Sno balled into a
problem with software install jobs failing. Looks like a
post-reconfiguration problem but atlas jobs filling up the site are
making things difficult to test the fixes. In progress (2/11)
https://ggus.eu/ws/ticket_info.php?ticket=95296 (1/7)
Queen Mary's glEXec ticket. Last word was that things are almost there.
On hold (23/10)
https://ggus.eu/ws/ticket_info.php?ticket=98427 (26/10)
LHCB pilots aborted at QM. Looks fixed (was an old issue rearing its
head: 88669), so the ticket can probably be solved. In progress (1/11)
EFDA-JET
https://ggus.eu/ws/ticket_info.php?ticket=97485 (21/9)
LHCB jobs failing at JET with handshake errors ("certificate verify
failed"). Even after upgrading CA certs and the WNs to SL6 the problem
persists (same sort of error by the looks of it). Team JET are still
battling at this. In progress (30/10)
https://ggus.eu/ws/ticket_info.php?ticket=95295 (1/7)
Jet's glExEc ticket. Team Jet have deployed this, but are having issues
and submitted a ticket describing their problems
(https://ggus.eu/ws/ticket_info.php?ticket=98609). In progress (4/11)
|