Print

Print


Remember, remember, the 5th of November
Tickets and treason and plot...

32 Open UK Tickets this week. It's the first Monday of the month, so we 
get to look at all of them. Have all the GGUS access problems 
experienced by atlas team members last week soothed themselves?

It's worth noting that a quarter of the open tickets are concerning 
networking/transfer type problems.

UNSUPPORTED GLITE SOFTWARE TICKETS
Congratulations to those sites who closed their tickets. I suspect these 
will be gone over in greater detail so again I'll just summarise them, 
we can look at each in the meeting if needed. All seem to be in hand, 
but my rule of thumb is the more recent the update the lesser the worry.

BRISTOL: https://ggus.eu/ws/ticket_info.php?ticket=87472 (17/10) In 
Progress (25/10)
CAMBRIDGE: https://ggus.eu/ws/ticket_info.php?ticket=87470 (17/10) In 
Progress (30/10)
BRUNEL: https://ggus.eu/ws/ticket_info.php?ticket=87469 (17/10) In 
Progress (30/10)
UCL: https://ggus.eu/ws/ticket_info.php?ticket=87468 (17/10) In Progress 
(1/11)
MANCHESTER: https://ggus.eu/ws/ticket_info.php?ticket=87467 (17/10) On 
Hold (24/10)
SHEFFIELD: https://ggus.eu/ws/ticket_info.php?ticket=87466 (17/10) On 
Hold (31/10)
ECDF: https://ggus.eu/ws/ticket_info.php?ticket=87171 (10/10) In 
progress (30/10)
EFDA-JET: https://ggus.eu/ws/ticket_info.php?ticket=87169 (10/10) In 
Progress (31/10)

NGI/VOMS
https://ggus.eu/ws/ticket_info.php?ticket=87813 (25/10)
Migration of vo.helios-vo.eu to Manchester. The transfer was completed 
manually,users were asked if things okay. In Progress, I "waiting for 
replied" it today. (30/10)

TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=88112 (3/11)
Slow atlas transfers, found to be caused by database problems. The 
problems have been fixed, the atlas instance restarted and data is 
flowing once more. Waiting for the thumbs up from atlas. Waiting for 
reply (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=86690 (3/10)
t2k are missing JPKEKCRC02 FTS ganglia metrics. There were some problems 
with the rrd files that meant they had to be deleted, which hopefully 
will fix the plots. Things look better to my eyes, In Progress, can be 
waiting for replied/solved (31/10)

https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9)
Packet loss on the RAL perfsonar. This is being taken under the wing of 
wider network investigations at RAL. On hold (31/10)

https://ggus.eu/ws/ticket_info.php?ticket=68853 (22/3/11)
DPM Sl4 retirement ticket. The only reason this is open is possible SL4 
disk servers at Durham right? Are they still there? In progress (30/10)

RALPP
https://ggus.eu/ws/ticket_info.php?ticket=88099 (3/11)
atlas seeing transfer errors into RALPP with "No transfer markers 
received" errors, although the problem seems to be abating itself 
slowly. Still just "Assigned" (4/11)

BRUNEL
https://ggus.eu/ws/ticket_info.php?ticket=88019 (1/11)
lhcb seeing failures on some nodes, blaming cvmfs. Raul has put CE in 
downtime. In Progress (1/11)

BIRMINGHAM
https://ggus.eu/ws/ticket_info.php?ticket=88009 (1/11)
Hone with one of their usual politely worded requests to get their jobs 
moving. Mark tweaked the batch system, and hone are happy again. In 
progress, can be closed (2/11)

https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9)
Poor sonar rates between Birmingham & BNL. Investigation made difficult 
due to EMI2 problems with the DPM, Brian has tried to see if doubling 
the number of steams would help. Did it? On hold (16/10)

DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=88151 (5/11)
apel nagios test problems. Assigned (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=86242 (20/9)
Biomed not cleaning out their cream sandbox. Mike pulled them up about 
this a while ago but no reply. We should close this ticket and/or 
re-ticket the VO if they're causing a mess. Waiting for reply (4/10)

https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)
atlas production job failures at Durham, which has become a bit of a 
catch-all ticket for atlas problems at Durham. On hold (3/9)

https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
Compchem authentication ticket. On hold, but is it still relevant? (8/10)

ECDF
https://ggus.eu/ws/ticket_info.php?ticket=88119 (4/11)
Atlas transfer's are failing due to a sickly pool node. In Progress (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=87958 (31/10)
atlas transfers between Edinburgh & FZK having problems, likely due to 
their firewall. FZK had been ticketed (no ticket number given though). 
In Progress (1/11)

https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9)
Poor atlas sonar rates between ECDF & BNL. Wahid has "harmonised" his 
tcp tunings, and is waiting on some further WAN upgrades. On hold (25/10)

GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=87879 (29/10)
na62 mapping problems, traced to a pool node not making its grid map. 
Seems things are fixed now, despite the user's initial protests to the 
contrary. Turns out they were just being impatient! In progress, can be 
closed (30/10)

SUSSEX
https://ggus.eu/ws/ticket_info.php?ticket=86996 (8/10)
Sussex's APEL problems. Things look better now after a lot of work. In 
progress, can be closed (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=81784 (1/5)
The Sussex Certification Chronicle. Surely the Grid Overlords are 
satisfied that Sussex is worthy of certification, after paying so much 
tribute in tears and sanity? :-) In progress (bit quiet though) (23/10)

QMUL
https://ggus.eu/ws/ticket_info.php?ticket=86306 (22/9)
Hard-to-kill lhcb jobs at QMUL. Chris is still getting regular 
hit-lists. Chris's corresponding ticket to the cream developers 
(https://ggus.eu/tech/ticket_show.php?ticket=87891) has problems as lhcb 
can't reply to it! He has however written information in this ticket. In 
progress (1/11)

CAMBRIDGE
https://ggus.eu/ws/ticket_info.php?ticket=86108 (14/9)
Perfsonar WAN bandwidth asymmetry. Been on hold for a while, the classic 
question must be asked - has the problem gone away all by itself? On 
hold (2/10)

OXFORD
https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9)
Low atlas sonar rates between BNL and Oxford. Tweaking the FTS settings 
hasn't made any difference. The next step was to tweak tcp tuning 
perimeters. Duncan observed similar transfer rates between Oxford & 
TRIUMF. In progress (19/10)

LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)
ilc jobs were aborting on one of Lancaster's CEs. This CE has poor 
performance, which for some reason was affecting ilc jobs more then 
most. The only fix is a reinstall (and reconfigure), but other 
priorities keep getting in the way (the latest being the use of this CE 
to test EMI2 tarballs). On hold (5/11)

https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)
t2k.org transfer timeout failures between RAL and Lancaster. Traffic is 
in the process of being routed over SJ5 from the lightpath to see if 
that helps. Other then that is the possibility that this is taking too 
long to stage from tape thing - but no reason why that's only being a 
problem for us. In progress (1/11)