Hello all,
Here's this week's ticket revue.
Cheers,
Matt
32 Open UK tickets this week. It's the start of the month, so all
tickets, great or small, will get reviewed.
NGI/VOMS
https://ggus.eu/ws/ticket_info.php?ticket=88546 (16/11)
Creation of epic.vo.gridpp.ac.uk. Name has been settled on, deployed on
the master VOMS instance and rolled out to the backups, ready for
whatever the next step will be. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=87813 (25/10)
Migration of the vo.helio-vo.eu to the UK. At last word everything was
done on the VOMS side, and testing on grid resources was needed to be
done. In progress (15/11)
TIER 1
https://ggus.eu/ws/ticket_info.php?ticket=89141 (3/12)
RAL are seeing a high atlas production job failure rate, and a possibly
related high FTS failure rate. In Progress (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89081 (30/11)
Failed biomed SAM tests, tracked to a missing / in a .lsc file. Should
be fixed, waiting for confirmation (but don't wait too long). Waiting
for reply (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=89063 (30/11)
The atlas frontier squids at RAL weren't working, fixed (networking
problem) but ticket reopened and placed on hold as the monitoring for
these boxes needs updating. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88596 (19/11)
t2k.org jobs weren't be delegated to RAL. After some effort this has
been fixed, the ticket can be closed. In progress (1/12)
https://ggus.eu/ws/ticket_info.php?ticket=86690 (3/10)
"JPKEKCRC02 missing from FTS ganglia metrics" for t2k. This has been a
pain to fix, at last word RAL were waiting on their ganglia expert to
come back, but that was a while ago (however I suspect they had bigger
fish to fry in November). In progress (6/11)
https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9)
Correlated packet loss on the RAL perfsonar. On hold pending a wider
scale investigation. On hold (31/10)
UCL
https://ggus.eu/ws/ticket_info.php?ticket=87468 (17/10)
The last Unsupported gLite software ticket (until the next batch). Ben
has put the remaining out of date CE into downtime after updating
another. In progress (29/11)
BIRMINGHAM
https://ggus.eu/ws/ticket_info.php?ticket=89129 (3/12)
High atlas production failure rate, likely to be due to the migration to
EMI. It could be a problem with the software area, Mark has involved
Alessandro De Salvo. Waiting for reply (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9)
Low atlas sonar rates to BNL from Birmingham. atlas tag removed from
ticket to lower noise. On hold (30/11)
IMPERIAL
https://ggus.eu/ws/ticket_info.php?ticket=89105 (1/12)
t2k.org jobs failing on I.C. WMSs due to proxy expiry. Daniela thinks
that it may be a problem with myproxy (the cern myproxy servers are
having dns alias trouble by the looks of it). In progress (3/12)
SHEFFIELD
https://ggus.eu/ws/ticket_info.php?ticket=89096 (30/11)
lhcb jobs to Sheffield that go through the WMS are seeing "BrokerHelper:
no compatible resources" resources, possibly due to the published values
for GlueCEStateFreeCPUs & GlueCEStateFreeJobSlots being 0. In progress
(3/12)
LANCASTER
https://ggus.eu/ws/ticket_info.php?ticket=89066 (30/11)
biomed nagios tests failing on the Lancaster SE. "problem listing
Storage Path(s)", which suggests to me that we have a publishing
problem. Couldn't find any obvious bugbears though, keeping on digging.
In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=89084 (30/11)
The problem in 89066 is also affecting the biomed CE tests. On hold (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88628 (20/11)
Getting t2k working on our clusters. Had some problem with building root
on one cluster, and even just submitting jobs to the other. In progress
(30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88772 (22/11)
One of Lancaster's clusters is reporting default values for
"GlueCEPolicyMaxCPUTime", mucking up lhcb's job scheduling. Tracked to a
problem in the scripts
(https://ggus.eu/ws/ticket_info.php?ticket=88904), the fix will be out
in January so I've on-holded this until then. On hold (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)
ilc jobs always fail on a Lancaster CE, possibly due to the CE's poor
performance. For the third time in a row I've had to put this work off
for a month. On hold (3/12)
https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)
t2k transfer failures to Lancaster. Having trouble getting a routing
change put through with the RAL networking team, probably due to them
having a lot on their plate over the past month. In Progress (3/12)
LIVERPOOL
https://ggus.eu/ws/ticket_info.php?ticket=88761 (22/11)
Technically a ticket from Liverpool to lhcb. A complaint over the
bandwidth used by lhcb jobs, probably due to a spike in lhcb jobs
running during an atlas quiet period. Are all sides satisfied about the
cause of this problem and the steps taken to prevent this happening
again? In progress (23/11)
SUSSEX
https://ggus.eu/ws/ticket_info.php?ticket=88631 (20/11)
Looks like Emyr has fixed Sussex's not-publishing-UserDNs APEL problem,
so this ticket can be closed. In Progress (26/11)
QMUL
https://ggus.eu/ws/ticket_info.php?ticket=88822 (23/11)
A similar ticket to 88772 at Lancaster. It could be that the SGE scripts
are needing updating too. In progress (26/11)
https://ggus.eu/ws/ticket_info.php?ticket=88987 (28/11)
t2k jobs are failing on ce05. In progress (30/11)
https://ggus.eu/ws/ticket_info.php?ticket=88887 (26/11)
lhcb pilots are also failing on ce05. In progress (28/11)
https://ggus.eu/ws/ticket_info.php?ticket=88878 (26/11)
hone are also having troubles on ce05... In progress (26/11)
https://ggus.eu/ws/ticket_info.php?ticket=86306 (22/9)
LHCB redundant, hard-to-kill pilots at QMUL. Chris opened a ticket to
the cream developers
(https://ggus.eu/tech/ticket_show.php?ticket=87891). But still the
request to purge lists come in from lhcb. In progress (21/11).
GLASGOW
https://ggus.eu/ws/ticket_info.php?ticket=88376 (8/11)
Biomed authorisation errors on CE svr026. Sam asked if this was the only
CE that has seen this problem on the 9th. No reply since, I added in the
biomed e-mail address explicitly to the cc list to try and coax a
response. Waiting for reply (9/11)
ECDF
https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9)
Low atlas sonar rates to BNL. Apparently things went from bad to worse
on the 23rd/24th of October. Duncan has removed the atlas VO tag on the
ticket to lower the noise on the atlas daily summary. On hold (30/11)
EFDA-JET
https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11)
biomed complaining about 444444 waiting jobs & no running jobs being
published by jet. The guys there have had a go at fixing the problem
(probably caused by their update to EMI2), but are likely out of ideas.
I had a brain wave regarding user access in maui.cfg but if that's not
the solution I'm sure they'll appreciate ideas. In progress (3/12).
OXFORD
https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9)
Poor atlas sonar rates from Oxford to BNL. On hold due to running out of
fixes to try, and the fact that they get good rates elsewhere. VO tag
removed to reduce noise. On hold (30/11)
DURHAM
https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)
atlas production failures at Durham. Site still in "quarantine". On hold
(20/11).
https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
compchem authentication failures. As this ticket has been on hold at a
low priority since January then it would seem worthwhile to contact the
ticket originators to see what they want to do. On hold (8/10)
|