Hello all,
Here's this week's ticket update, it's an in depth one covering all
tickets as despite only having a week off work I had completely
forgotten what was going on with anything ever (plus I aim to do a "deep
review" roughly once a month). I liked how Jeremy formatted last week's
updated, and tried to incorporate some of that below (particularly the
dates). Although I failed to be as succinct with my ticket descriptions.
Cheers,
Matt
22 Open UK Tickets this week.
I'll start with a quick reminder to people that the job of "In
Progress"-ing of tickets has fallen back to the sites admins.
UK
https://ggus.eu/ws/ticket_info.php?ticket=84381 (19/7)
Ticket to track the creation of a new VO for the COMET experiment
(possibly to be called comet.j-parc.jp). A request to the voms admins
was submitted at the time of the ticket. Increased the list of cc'd
parties (24/7).
https://ggus.eu/ws/ticket_info.php?ticket=80259 (14/4)
Creation of the neurogrid.incf.org VO. Requests to GGUS and for the
enabling of the VO on the WMS & LFC going out (20/7).
TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=84408 (20/7)
Request to enable neurogrid on WMS & LFC, in progress on the 20th but no
word since.
https://ggus.eu/ws/ticket_info.php?ticket=83927 (6/7)
SNO+ attempting to get FTS to work for them. This ticket has been around
the houses, before being set on on the RAL FTS (24/7). After a few
tweaks much progress seems to have been made, hopefully things will work
now, waiting for reply (26/7).
https://ggus.eu/ws/ticket_info.php?ticket=84503 (24/7)
SNO+ asked for python-dev packages to be installed at RAL, who would
rather not put it on their workers and so SNO+ have been asked if they
can install it on their software area (25/7).
https://ggus.eu/ws/ticket_info.php?ticket=84492 (24/7)
SNO+ Jobs were not being matched to their queue at RAL, seems to be a
problem with jobs (submitted via Ganga to the WMS) matching against
GlueHostMainMemoryVirtualSize (which was not set) rather than
GlueHostMainMemoryRAMSize. GlueHostMainMemoryVirtualSize has been set
for the queue in question now, Waiting for reply from SNO+ (27/7).
OXFORD
https://ggus.eu/ws/ticket_info.php?ticket=84487 (24/7)
SNO+ are having curl problems at Oxford (although the same command works
at QMUL). This ticket seems to have got stuck, I kicked it into
notifying the site and Ewan promptly in progressed the ticket (30/7).
--As a side note all the above jobs seem to have been victim of a game
of ticket tennis, or in the latter case noto assigned at all. Some
problem with SNO+ tickets? Or was it simply that Matt wasn't notifying
site's manually as many more veteran submitters do.
LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)
t2k.org transfer errors. The disk server is okay, the problem is most
likely "in the pipes" at the Lancaster end of the lightpath. Involving
the local networking team, on hold till they get back to us with a
solution (26/7).
https://ggus.eu/ws/ticket_info.php?ticket=84583 (26/7)
LHCB jobs aborting on one of the Lancaster CEs, reopened. The error
message is "Transfer to CREAM failed due to exception: Failed to create
a delegation id for job
https://wms302.cern.ch:9000/9meup5GIEhvKFl6t1ogUhw: reason is Delegation
ID '13432989762E590625wms3022Ecern2Ech' already exists for client".
Google has failed me, and cleaning up the jobs didn't fix the problem.
Has anyone else seen this error message? In progress (30/7).
BRUNEL
https://ggus.eu/ws/ticket_info.php?ticket=84639 (30/7)
Brunel's DPM is below the WLCG recommended level. Request from Brian for
upgrade plans. (related to 68853).
MANCHESTER
https://ggus.eu/ws/ticket_info.php?ticket=84579 (26/7)
Hone had jobs in a scheduled status for a long time on one of
Manchester's queues. There was also a transient SE problem mentioned in
the ticket. Looks like the ticket can be closed as of Friday (27/7).
DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)
High atlas production failure rate. Mike offlined suspect nodes (19/7)
but the UK cloud has set Durham to test mode (20/7).
https://ggus.eu/ws/ticket_info.php?ticket=83950 (7/7)
lhcb cvmfs problems. First attempts at triage failed, and recent
attempts by lhcb to confirm the problem fixed have been blocked by job
submission problems (26/7).
https://ggus.eu/ws/ticket_info.php?ticket=68859 (22/3/11)
Brian's request for DPM upgrade plans. As of 19/1 still had disk servers
to update. 30/7 Brian requested some more information.
https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
Compchem were seeing authentication problems at a number of sites,
including Durham. On Hold (19/1). On 18/7 Mark M will poke Mike to see
if the ticket can be closed.
RHUL
https://ggus.eu/ws/ticket_info.php?ticket=83627 (27/6)
Biomed seeing negative space published. Possibly related to ticket
#81439. Despite database cleanup and extensive investigation the problem
persists. Still in progress (20/7).
GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=83283 (14/6)
lhcb cvmfs problems. Dave cited
https://savannah.cern.ch/bugs/index.php?95420 &
https://savannah.cern.ch/support/?129468 (18/6). Has there been any
plans to try out the newer versions of cvmfs (or does the problem even
still exist?).
SUSSEX
https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)
The Sussex saga continues. Emyr continues to battle bravely, with
support from Ewan (ably filling in for Kashif) and Daniela. Tests have
been moved to the Imperial WMS to work around some oddities that were
seen, but weird possibly bdii related troubles still haunt the endeavour.
BRISTOL
https://ggus.eu/ws/ticket_info.php?ticket=80155 (12/3)
Upgrade plans for the Bristol SE. Winnie has outlined a plan (9/7),
ticket has been put on hold (18/7) until the end of August.
"OTHER"
https://ggus.eu/ws/ticket_info.php?ticket=68853 (22/3/11)
The "master ticket" to Brian's Crusty SE Upgrade/Decommissioning
queries. On hold (17/7), only Durham, Bristol & Brunel left.
https://ggus.eu/ws/ticket_info.php?ticket=83213 (12/6)
Chris W ticketed ngs.ac.uk concerning the decommissioning of
ce03.esc.qmul.ac.uk. No reply from them. Did they even get the message?
https://ggus.eu/ws/ticket_info.php?ticket=82492 (24/5)
Chris' ticket concerning VOMS re-signing requests. On Hold until the
voms handover back to GridPP is complete (24/7).
SOLVED CASES
No ground breaking cases have been solved over the last week.
The UK's tickets
I still don't have a good way of tracking tickets submitted by us. If
you have a ticket that you think we'd all be interested in, please send
me a link. I'll go over these tickets in detail next week.
https://ggus.eu/ws/ticket_info.php?ticket=84015
A ticket for Lancaster's LSF apel problems.
https://ggus.eu/ws/ticket_info.php?ticket=84641
Daniela spotted a CMS user running multicore jobs naughtily.
|