Remember, remember, the 5th of November
Tickets and treason and plot...
32 Open UK Tickets this week. It's the first Monday of the month, so we
get to look at all of them. Have all the GGUS access problems
experienced by atlas team members last week soothed themselves?
It's worth noting that a quarter of the open tickets are concerning
networking/transfer type problems.
UNSUPPORTED GLITE SOFTWARE TICKETS
Congratulations to those sites who closed their tickets. I suspect these
will be gone over in greater detail so again I'll just summarise them,
we can look at each in the meeting if needed. All seem to be in hand,
but my rule of thumb is the more recent the update the lesser the worry.
BRISTOL: https://ggus.eu/ws/ticket_info.php?ticket=87472 (17/10) In
Progress (25/10)
CAMBRIDGE: https://ggus.eu/ws/ticket_info.php?ticket=87470 (17/10) In
Progress (30/10)
BRUNEL: https://ggus.eu/ws/ticket_info.php?ticket=87469 (17/10) In
Progress (30/10)
UCL: https://ggus.eu/ws/ticket_info.php?ticket=87468 (17/10) In Progress
(1/11)
MANCHESTER: https://ggus.eu/ws/ticket_info.php?ticket=87467 (17/10) On
Hold (24/10)
SHEFFIELD: https://ggus.eu/ws/ticket_info.php?ticket=87466 (17/10) On
Hold (31/10)
ECDF: https://ggus.eu/ws/ticket_info.php?ticket=87171 (10/10) In
progress (30/10)
EFDA-JET: https://ggus.eu/ws/ticket_info.php?ticket=87169 (10/10) In
Progress (31/10)
NGI/VOMS
https://ggus.eu/ws/ticket_info.php?ticket=87813 (25/10)
Migration of vo.helios-vo.eu to Manchester. The transfer was completed
manually,users were asked if things okay. In Progress, I "waiting for
replied" it today. (30/10)
TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=88112 (3/11)
Slow atlas transfers, found to be caused by database problems. The
problems have been fixed, the atlas instance restarted and data is
flowing once more. Waiting for the thumbs up from atlas. Waiting for
reply (5/11)
https://ggus.eu/ws/ticket_info.php?ticket=86690 (3/10)
t2k are missing JPKEKCRC02 FTS ganglia metrics. There were some problems
with the rrd files that meant they had to be deleted, which hopefully
will fix the plots. Things look better to my eyes, In Progress, can be
waiting for replied/solved (31/10)
https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9)
Packet loss on the RAL perfsonar. This is being taken under the wing of
wider network investigations at RAL. On hold (31/10)
https://ggus.eu/ws/ticket_info.php?ticket=68853 (22/3/11)
DPM Sl4 retirement ticket. The only reason this is open is possible SL4
disk servers at Durham right? Are they still there? In progress (30/10)
RALPP
https://ggus.eu/ws/ticket_info.php?ticket=88099 (3/11)
atlas seeing transfer errors into RALPP with "No transfer markers
received" errors, although the problem seems to be abating itself
slowly. Still just "Assigned" (4/11)
BRUNEL
https://ggus.eu/ws/ticket_info.php?ticket=88019 (1/11)
lhcb seeing failures on some nodes, blaming cvmfs. Raul has put CE in
downtime. In Progress (1/11)
BIRMINGHAM
https://ggus.eu/ws/ticket_info.php?ticket=88009 (1/11)
Hone with one of their usual politely worded requests to get their jobs
moving. Mark tweaked the batch system, and hone are happy again. In
progress, can be closed (2/11)
https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9)
Poor sonar rates between Birmingham & BNL. Investigation made difficult
due to EMI2 problems with the DPM, Brian has tried to see if doubling
the number of steams would help. Did it? On hold (16/10)
DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=88151 (5/11)
apel nagios test problems. Assigned (5/11)
https://ggus.eu/ws/ticket_info.php?ticket=86242 (20/9)
Biomed not cleaning out their cream sandbox. Mike pulled them up about
this a while ago but no reply. We should close this ticket and/or
re-ticket the VO if they're causing a mess. Waiting for reply (4/10)
https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)
atlas production job failures at Durham, which has become a bit of a
catch-all ticket for atlas problems at Durham. On hold (3/9)
https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
Compchem authentication ticket. On hold, but is it still relevant? (8/10)
ECDF
https://ggus.eu/ws/ticket_info.php?ticket=88119 (4/11)
Atlas transfer's are failing due to a sickly pool node. In Progress (5/11)
https://ggus.eu/ws/ticket_info.php?ticket=87958 (31/10)
atlas transfers between Edinburgh & FZK having problems, likely due to
their firewall. FZK had been ticketed (no ticket number given though).
In Progress (1/11)
https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9)
Poor atlas sonar rates between ECDF & BNL. Wahid has "harmonised" his
tcp tunings, and is waiting on some further WAN upgrades. On hold (25/10)
GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=87879 (29/10)
na62 mapping problems, traced to a pool node not making its grid map.
Seems things are fixed now, despite the user's initial protests to the
contrary. Turns out they were just being impatient! In progress, can be
closed (30/10)
SUSSEX
https://ggus.eu/ws/ticket_info.php?ticket=86996 (8/10)
Sussex's APEL problems. Things look better now after a lot of work. In
progress, can be closed (5/11)
https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)
The Sussex Certification Chronicle. Surely the Grid Overlords are
satisfied that Sussex is worthy of certification, after paying so much
tribute in tears and sanity? :-) In progress (bit quiet though) (23/10)
QMUL
https://ggus.eu/ws/ticket_info.php?ticket=86306 (22/9)
Hard-to-kill lhcb jobs at QMUL. Chris is still getting regular
hit-lists. Chris's corresponding ticket to the cream developers
(https://ggus.eu/tech/ticket_show.php?ticket=87891) has problems as lhcb
can't reply to it! He has however written information in this ticket. In
progress (1/11)
CAMBRIDGE
https://ggus.eu/ws/ticket_info.php?ticket=86108 (14/9)
Perfsonar WAN bandwidth asymmetry. Been on hold for a while, the classic
question must be asked - has the problem gone away all by itself? On
hold (2/10)
OXFORD
https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9)
Low atlas sonar rates between BNL and Oxford. Tweaking the FTS settings
hasn't made any difference. The next step was to tweak tcp tuning
perimeters. Duncan observed similar transfer rates between Oxford &
TRIUMF. In progress (19/10)
LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)
ilc jobs were aborting on one of Lancaster's CEs. This CE has poor
performance, which for some reason was affecting ilc jobs more then
most. The only fix is a reinstall (and reconfigure), but other
priorities keep getting in the way (the latest being the use of this CE
to test EMI2 tarballs). On hold (5/11)
https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)
t2k.org transfer timeout failures between RAL and Lancaster. Traffic is
in the process of being routed over SJ5 from the lightpath to see if
that helps. Other then that is the possibility that this is taking too
long to stage from tape thing - but no reason why that's only being a
problem for us. In progress (1/11)
|