We don't seem to have any APEL tests listed here https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=site-UKI-SOUTHGRID-SUSX&style=detail

We have definitely had a listing for APEL on our page in the past.


On 22/11/12 10:30, John Gordon wrote:
[log in to unmask]" type="cite">

Good. Well done.

 

From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of Kashif Mohammad
Sent: 21 November 2012 16:56
To: [log in to unmask]
Subject: Re: Possible problem with Nagios

 

Hi John

 

I was keeping eye one that and it didn’t update  during GOCDB problem.  Fortunately we were using Imperial top bdii with Nagios configuration so Tier 1 outage didn’t affect much as for monitoring is concerned.

The only problem I have seen is that apart from RAL WMS’s, Glasgow WMS’s were also in error state probably because it may be using Tier1 top BDII.  So a lot of nagios jobs stayed in waiting state before I changed configuration to use just Imperial WMS.

 

Cheers

Kashif

 

From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of John Gordon
Sent: 21 November 2012 14:59
To: [log in to unmask]
Subject: Possible problem with Nagios

 

When GOCDB was operating from its failover site this morning, it looks like clients downloading information only got partial data.  This resulted in many services not being monitored.  GOCDB at RAL is now back online and the full set of services is available for download.  SAM has updated and everything is fine there.

 

I have looked at the UK MyEGI and while I see N/A bands recently I don't see any unexpected red that would affect availability. We may have been lucky and the UK Nagios did not update in the bad window. If it did maybe Kashif could force an update instead of waiting six hours.

 

John

 

 

--
Scanned by iCritical.

 


--
Scanned by iCritical.