On 20/10/2014 16:31, Matt Doidge wrote:
> Hello,
> I'm the only one in my office today, and as we approach the afternoon
> and the longest conversation I've had today has been with my cats I
> should jump straight to the tickets before I say something weird...
>
> Non-LHC VO Nagios Failures:
> https://vo-nagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15
>
>
> Liverpool, Lancaster (we're getting better), Sheffield, EFDA-JET, The
> Tier 1, Bristol (in downtime) and Cambridge are on "the list". Most are
> transient, load based errors. gridpp, pheno and southgrid seem to be the
> VOs having most problems.
The Cambridge ones at least appear to be because job submission is being
disabled from time to time at the CE, presumably by ATLAS (I'm certainly
not doing it!). If it's not that, suggestions welcome.
John
>
> We're up to 30 Open UK Tickets this week. Here are the highlights:
>
> TIER 1
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=109276 (11/10)
> Submissions to the FTS3 REST interface was failing for some, probably
> after the certs or crls got stale. Andrew L suggested implementing an
> httpd restart which Maarten suggested was overkill - but anyhoo the
> submitter has come back to say that he hasn't seen a problem all week,
> so this ticket can likely be closed. In progress (20/10)
>
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=108845 (27/9)
> Just a heads up that this atlas transfer failure ticket has been
> reopened. Reopened (18/10)
>
> RALPP
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=109360 (15/10)
> This SNO+ ticket, about failing nagios tests at RALPP, hasn't been
> noticed yet. Assigned (15/10)
>
> SHEFFIELD
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=109207 (8/10)
> SNO+ would like the VO_SW_DIR environmental variable to point to cvmfs -
> I know Elena has looked at this, any progress? In progress (9/10)
>
> Similar with another Sno_ ticket at Sheffield:
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=109223 (9/10)
>
> BRUNEL
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=109379 (16/10)
> SRM Nagios test failures. It looks like Brunels SE is in a dodgey state
> - too many ftp connection failures have been seen in the gridftp logs,
> httpd causing heavy load, possible SELinux problems after DB move. I'm
> sure if anyone has any input on this it would be appreciated. In
> progress (17/10)
>
> IMPERIAL/DIRAC
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=108723 (23/9)
> I think this ticket from Chris W, containing questions for the DIRAC
> team, can be closed in favour of the new line of communication Daniela
> set up (https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users).
> Waiting for reply (7/10)
>
> ECDF AND GLASGOW
> Two very similar LHCB cvmfs tickets at these sites, any chance of a
> link? Or perhaps just a coincidence?
> ECDF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=109440
> GLASGOW: https://ggus.eu/index.php?mode=ticket_info&ticket_id=109439
>
> I think that's all, at least as far as I can see.
>
> Cheers!
> Matt
|