Print

Print


Apologies, but I may be late to today's meeting - hardware issue to 
resolve (non-GRID though :-) ).

John

On 03/12/2012 15:56, Matt Doidge wrote:
> Hello all,
> Here's this week's ticket revue.
> Cheers,
> Matt
>
> 32 Open UK tickets this week. It's the start of the month, so all
> tickets, great or small, will get reviewed.
>
> NGI/VOMS
> https://ggus.eu/ws/ticket_info.php?ticket=88546 (16/11)
> Creation of epic.vo.gridpp.ac.uk. Name has been settled on, deployed on
> the master VOMS instance and rolled out to the backups, ready for
> whatever the next step will be. In progress (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=87813 (25/10)
> Migration of the vo.helio-vo.eu to the UK. At last word everything was
> done on the VOMS side, and testing on grid resources was needed to be
> done. In progress (15/11)
>
> TIER 1
> https://ggus.eu/ws/ticket_info.php?ticket=89141 (3/12)
> RAL are seeing a high atlas production job failure rate, and a possibly
> related high FTS failure rate. In Progress (3/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=89081 (30/11)
> Failed biomed SAM tests, tracked to a missing / in a .lsc file. Should
> be fixed, waiting for confirmation (but don't wait too long). Waiting
> for reply (3/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=89063 (30/11)
> The atlas frontier squids at RAL weren't working, fixed (networking
> problem) but ticket reopened and placed on hold as the monitoring for
> these boxes needs updating. On hold (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88596 (19/11)
> t2k.org jobs weren't be delegated to RAL. After some effort this has
> been fixed, the ticket can be closed. In progress (1/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=86690 (3/10)
> "JPKEKCRC02 missing from FTS ganglia metrics" for t2k. This has been a
> pain to fix, at last word RAL were waiting on their ganglia expert to
> come back, but that was a while ago (however I suspect they had bigger
> fish to fry in November). In progress (6/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=86152 (17/9)
> Correlated packet loss on the RAL perfsonar. On hold pending a wider
> scale investigation. On hold (31/10)
>
> UCL
> https://ggus.eu/ws/ticket_info.php?ticket=87468 (17/10)
> The last Unsupported gLite software ticket (until the next batch). Ben
> has put the remaining out of date CE into downtime after updating
> another. In progress (29/11)
>
> BIRMINGHAM
> https://ggus.eu/ws/ticket_info.php?ticket=89129 (3/12)
> High atlas production failure rate, likely to be due to the migration to
> EMI. It could be a problem with the software area, Mark has involved
> Alessandro De Salvo. Waiting for reply (3/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=86105 (14/9)
> Low atlas sonar rates to BNL from Birmingham. atlas tag removed from
> ticket to lower noise. On hold (30/11)
>
> IMPERIAL
> https://ggus.eu/ws/ticket_info.php?ticket=89105 (1/12)
> t2k.org jobs failing on I.C. WMSs due to proxy expiry. Daniela thinks
> that it may be a problem with myproxy (the cern myproxy servers are
> having dns alias trouble by the looks of it). In progress (3/12)
>
> SHEFFIELD
> https://ggus.eu/ws/ticket_info.php?ticket=89096 (30/11)
> lhcb jobs to Sheffield that go through the WMS are seeing "BrokerHelper:
> no compatible resources" resources, possibly due to the published values
> for GlueCEStateFreeCPUs & GlueCEStateFreeJobSlots being 0. In progress
> (3/12)
>
> LANCASTER
> https://ggus.eu/ws/ticket_info.php?ticket=89066 (30/11)
> biomed nagios tests failing on the Lancaster SE. "problem listing
> Storage Path(s)", which suggests to me that we have a publishing
> problem. Couldn't find any obvious bugbears though, keeping on digging.
> In progress (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=89084 (30/11)
> The problem in 89066 is also affecting the biomed CE tests. On hold (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88628 (20/11)
> Getting t2k working on our clusters. Had some problem with building root
> on one cluster, and even just submitting jobs to the other. In progress
> (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88772 (22/11)
> One of Lancaster's clusters is reporting default values for
> "GlueCEPolicyMaxCPUTime", mucking up lhcb's job scheduling. Tracked to a
> problem in the scripts
> (https://ggus.eu/ws/ticket_info.php?ticket=88904), the fix will be out
> in January so I've on-holded this until then. On hold (3/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=85367 (20/8)
> ilc jobs always fail on a Lancaster CE, possibly due to the CE's poor
> performance. For the third time in a row I've had to put this work off
> for a month. On hold (3/12)
>
> https://ggus.eu/ws/ticket_info.php?ticket=84461 (23/7)
> t2k transfer failures to Lancaster. Having trouble getting a routing
> change put through with the RAL networking team, probably due to them
> having a lot on their plate over the past month. In Progress (3/12)
>
> LIVERPOOL
> https://ggus.eu/ws/ticket_info.php?ticket=88761 (22/11)
> Technically a ticket from Liverpool to lhcb. A complaint over the
> bandwidth used by lhcb jobs, probably due to a spike in lhcb jobs
> running during an atlas quiet period. Are all sides satisfied about the
> cause of this problem and the steps taken to prevent this happening
> again? In progress (23/11)
>
> SUSSEX
> https://ggus.eu/ws/ticket_info.php?ticket=88631 (20/11)
> Looks like Emyr has fixed Sussex's not-publishing-UserDNs APEL problem,
> so this ticket can be closed. In Progress (26/11)
>
> QMUL
> https://ggus.eu/ws/ticket_info.php?ticket=88822 (23/11)
> A similar ticket to 88772 at Lancaster. It could be that the SGE scripts
> are needing updating too. In progress (26/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88987 (28/11)
> t2k jobs are failing on ce05. In progress (30/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88887 (26/11)
> lhcb pilots are also failing on ce05. In progress (28/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=88878 (26/11)
> hone are also having troubles on ce05... In progress (26/11)
>
> https://ggus.eu/ws/ticket_info.php?ticket=86306 (22/9)
> LHCB redundant, hard-to-kill pilots at QMUL. Chris opened a ticket to
> the cream developers
> (https://ggus.eu/tech/ticket_show.php?ticket=87891). But still the
> request to purge lists come in from lhcb. In progress (21/11).
>
> GLASGOW
> https://ggus.eu/ws/ticket_info.php?ticket=88376 (8/11)
> Biomed authorisation errors on CE svr026. Sam asked if this was the only
> CE that has seen this problem on the 9th. No reply since, I added in the
> biomed e-mail address explicitly to the cc list to try and coax a
> response. Waiting for reply (9/11)
>
> ECDF
> https://ggus.eu/ws/ticket_info.php?ticket=86334 (24/9)
> Low atlas sonar rates to BNL. Apparently things went from bad to worse
> on the 23rd/24th of October. Duncan has removed the atlas VO tag on the
> ticket to lower the noise on the atlas daily summary. On hold (30/11)
>
> EFDA-JET
> https://ggus.eu/ws/ticket_info.php?ticket=88227 (6/11)
> biomed complaining about 444444 waiting jobs & no running jobs being
> published by jet. The guys there have had a go at fixing the problem
> (probably caused by their update to EMI2), but are likely out of ideas.
> I had a brain wave regarding user access in maui.cfg but if that's not
> the solution I'm sure they'll appreciate ideas. In progress (3/12).
>
> OXFORD
> https://ggus.eu/ws/ticket_info.php?ticket=86106 (14/9)
> Poor atlas sonar rates from Oxford to BNL. On hold due to running out of
> fixes to try, and the fact that they get good rates elsewhere. VO tag
> removed to reduce noise. On hold (30/11)
>
> DURHAM
> https://ggus.eu/ws/ticket_info.php?ticket=84123 (11/7)
> atlas production failures at Durham. Site still in "quarantine". On hold
> (20/11).
>
> https://ggus.eu/ws/ticket_info.php?ticket=75488 (19/10/11)
> compchem authentication failures. As this ticket has been on hold at a
> low priority since January then it would seem worthwhile to contact the
> ticket originators to see what they want to do. On hold (8/10)