On 29/04/14 11:29, Kashif Mohammad wrote:
> Hi
>
> I have now reconfigured SAM Nagios to use Imperial top BDII so most of tests will be OK in 1-2 hours. I should have done it before hand but somehow I missed the notification when it was planned.
>
> There are three parts of SAM test and going down of top BDII affect them with varying degree. Job submitted through wms generally copes well because it uses BDII_LIST and it falls back to other BDII if first one fails. But as Ewan pointed out it only works if first BDII is completely unreachable.
Ewan, can you or Kashif file a ticket about this against RAL/BDII.
It is unsatisfactory that the RAL BDIIs return the wrong answer when
they are starting up.
It isn't clear to me that this is due to the BDII software, or the way
RAL operates this.
Chris
> SE tests does not use BDII_LIST and it is hard coded option in SAM-NAGIOS so I have to change it manually.
>
> Lcg-cr test which is part of CE testing gets BDII information from "LCG_GFAL_INFOSYS" variables set at WN's and it is a site setting.
>
> So for some sites, lcg-cr tests may keep failing if they haven't set BDII_LIST on WN's.
>
> Thanks
> Kashif
>
>
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Ian Collier
>> Sent: 29 April 2014 09:22
>> To: [log in to unmask]
>> Subject: Re: nagios all red/yellow (all UK), Upstream/Tier-1 BDII problem?
>>
>> To flesh out a little more - there is a major network intervention.
>>
>> Assuming all goes accruing to plan the restart of services will begin at midday.
>>
>> (I am at CERN so have no direct knowledge of how it is progressing.)
>>
>> Cheers,
>>
>> -Ian
>>
>> On 29 Apr 2014, at 10:11, Christopher Walker <[log in to unmask]>
>> wrote:
>>
>>> On 29/04/14 10:10, Winnie Lacesso wrote:
>>>> Good morning,
>>>>
>>>> Nagios monitoring shows all UK sites appear yellow/orange this morning
>>>>
>>>> Main error seems to be
>>>> UNKNOWN: Failed to get working BDII from [lcgbdii.gridpp.rl.ac.uk:2170].
>>>>
>>>> Am unable to get to Tier1 Status page
>>>> http://www.gridpp.rl.ac.uk/status/
>>>>
>>>> Is that the right URL?
>>>> Is there an Upstream problem affecting all UK sites?
>>>>
>>>>
>>> RAL had network downtime today didn't they?
>>>
>>> Chris
|