Howdy all,
27 Open UK tickets this week, and as it's the first working day of the
month, we have the joy of looking at all of them.
NGI
https://ggus.eu/ws/ticket_info.php?ticket=93142 (5/4)
The UK ROAD is being pulled over the coals over not handling recent
tickets "according to escalation procedure". I suspect all the tickets
refered to are EMI1 upgrade ones, so justifying ourselves should be
straightforward. Assigned to ngi-ops. (8/4)
VOMS
https://ggus.eu/ws/ticket_info.php?ticket=92306 (7/3)
Rolling out voms support for the new, Glasgow-based earthsci vo. After
some discussion on domain naming it was decided to go with the vo name
earthsci.vo.gridpp.ac.uk. It has been deployed at the Manchester, Oxford
and IC, so I assume the next step is testing it. In progress (4/4)
EMI 1 UPGRADE TICKETS:
RALPP https://ggus.eu/ws/ticket_info.php?ticket=91997 (On hold, extended
5/4)
Chris has put back the dcache upgrade a bit, but it seems in order. The
last other EMI1 holdout was being drained for upgrade last week.
GLASGOW https://ggus.eu/ws/ticket_info.php?ticket=91992 (In progress,
extended 5/4)
Not much word from the Glasgow lads in a while (since 11/3), but they
only had a few holdouts left.
https://ggus.eu/ws/ticket_info.php?ticket=92805 (On hold)
Glasgow's DPM ticket (despite their DPM technically being up to date)-
Sam hopes to "update" when DPM 1.8.7 comes out, but if that looks
unlikely in the time frame SAM will reinstall the DPM rpms to simulate
an upgrade.
SHEFFIELD https://ggus.eu/ws/ticket_info.php?ticket=91990 (On hold,
extended 5/4)
Just some worker nodes left at Sheffield. Looking good.
BRUNEL https://ggus.eu/ws/ticket_info.php?ticket=91975 (On hold)
Raul upgraded his CE, only to find that the nagios tests haven't picked
up the upgrade! Daniela suggests a site BDII restart.
DURHAM https://ggus.eu/ws/ticket_info.php?ticket=92804 (In progress,
extended 5/4)
Not much news from Mike about this in the last few weeks- I think that
he's in the same boat as Sam - technically up to date (just from the
"wrong" repo).
COMMON OR GARDEN TICKETS:
OXFORD
https://ggus.eu/ws/ticket_info.php?ticket=92688 (20/3)
Brian asked for a data dump, Ewan provided two! Ewan has left the ticket
open whilst atlas decide what to do with the information. Waiting for
reply (2/4)
GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=89804 (18/12/2012)
Moving atlas data from the groupdisk token. Last word was from Stephene
on the 3/3, asking for a dump of what remains. I think that the
conversation has moved offline to expedite things. How goes it? On hold
(3/3)
https://ggus.eu/ws/ticket_info.php?ticket=92691 (20/3)
Glasgow supplied Brian with a list of all the files on the SE, Brian has
given back a list of all the "dark data" files that they couldn't delete
remotely. In progress (8/4)
https://ggus.eu/ws/ticket_info.php?ticket=93036 (2/4)
Glasgow were being bit by stage in failures after disk server stress
killed the xrootd service on a node. Measures have been put in place to
stop this happening again, and Sam has said some wise words on this
issue (as it was data hungry production jobs that caused the deadly
stress). Sam suggests that it would be beneficial to have these
data-hungry production jobs flagged in some way, so that they can be
treated similarly to how analysis jobs are (staggered starts, limiting
the maximum number running etc.) In progress (5/4)
-This raises the question, is it likely that suggestions put in a ticket
like this would work their way up the chain to someone who could act on
them?
DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=92590 (18/3)
lhcb were having what looks like authorisation problems at Durham. Not
much news on the ticket since then, does the problem persist? On hold (2/4)
MANCHESTER
https://ggus.eu/ws/ticket_info.php?ticket=93179 (8/4)
atlas would like 5TB shuffled from localgroupdisk to datadisk. Assigned
(8/4)
LIVERPOOL
https://ggus.eu/ws/ticket_info.php?ticket=93160 (7/4)
Atlas were suffering transfer failures, which puzzled the Liver lads as
their logs showed the transfers succeeding. It could have been a problem
with the University firewalls - the timing of the problems coincided
with a change in the Uni firewall. These have been reverted so lets see
if things go back to normal. In progress (8/4)
LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=91304 (8/2)
LHCB jobs were running in the tidgey home partition on the Lancaster
shared cluster. I've tried to put in place a job wrapper that cds to
$TMPDIR, but no joy - not sure what I'm doing wrong. On hold (27/3)
RHUL
https://ggus.eu/ws/ticket_info.php?ticket=89751 (17/12/12)
Path MTU discovery problems for RHUL. Passed to the networking chaps and
Janet, this may be a long time in the solving. On hold (28/1)
https://ggus.eu/ws/ticket_info.php?ticket=92969 (29/3)
Biomed are reporting seeing negative space on the RHUL SE- an old
bugbear resurrected. In progress (1/4)
QMUL
https://ggus.eu/ws/ticket_info.php?ticket=93180 (8/4)
QM got a nagios ticket for the recent APEL troubles, Dan rightfully
cited the apel ticket. In progress (8/4)
https://ggus.eu/ws/ticket_info.php?ticket=92951 (29/3)
Atlas transfer failures, caused by a crash in a disk storage node.
Reopened after the initial fix, it looks like a lustre bug is plaguing
the QM chaps. Currently they're hoping on a bug fix or else they'll need
to rollback. In progress (8/4)
TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=91658 (20/2)
Chris requesting webdav support on the RAL LFC. The RAL team are waiting
on the next lfc version with better webdav support to come out in
production. On hold (3/4)
https://ggus.eu/ws/ticket_info.php?ticket=91029 (30/1)
Long standing ticket concerning the srm troubles with certain robot DNs.
No fix is likely in the near future. On hold (27/2)
https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/12/12)
Correlated packet loss on the RAL perfsonar. The picture looks improved
after last month's intervention, but still needs understanding. Proposed
to wait until after the May intervention before looking at this hard
again. On hold (27/3)
https://ggus.eu/ws/ticket_info.php?ticket=93136 (5/4)
epic VO having trouble downloading output from the RAL WMS. Most likely
related to known problem https://ggus.eu/ws/ticket_info.php?ticket=92288
(submitted by Jon from t2k). In progress (5/4)
https://ggus.eu/ws/ticket_info.php?ticket=93149 (5/4)
Obviously Friday was the day of tickets. atlas were seeing a large
number of cvmfs related cmtside failures. These nodes were testing the
latest cvmfs 2.1.8, and have been rolled back. Waiting for reply (8/4)
https://ggus.eu/ws/ticket_info.php?ticket=92266 (6/3)
RAL were having problems with their myproxy aliases not matching up to
their myproxy's certs. After trying a few fixes the RAL guys are setting
up a new machine with the hostname and certificate match. Aim to have
this done within a fortnight. In progress (28/3)
APEL
Just in case you guys haven't been reading TB-SUPPORT, the ticket
tracking the current APEL problems:
https://ggus.eu/ws/ticket_info.php?ticket=93183
|