On 05/09/2012 10:34, Chris Brew wrote:
>
> And if we're going to switch between Nagios servers reasonable frequently
> could we have a single DNS name (nagios.gridpp.ac.uk say) that points to the
> live server. I've been happily checking the Oxford server and been seeing no
> issues all month.
Is there a way to programmatically determine which is the live Nagios?
How does the COP Dashboard know which one to link to etc?
We could probably automate it using that information, either with
changes to DNS or with a single-page website that just forwards you to
the current site, that people could bookmark. Otherwise it's easy enough
to add another alias in the gridpp.ac.uk DNS but we'd need to be sure
to be notified when it changes.
Cheers
Andrew
> Yours,
> Chris.
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Chris Brew
>> Sent: 05 September 2012 10:05
>> To: [log in to unmask]
>> Subject: Re: T2 Availability & Reliability - August 2012
>>
>> Hi Jeremy,
>>
>> That looks wrong for RALPP to me. The Oxford nagios has us pretty much
>> green
>> for the whole month (apart from a batch server wobble on the first and
>> a
>> couple of single test failures later.
>>
>> Ah but didn't we switch over to the Lancaster nagios for a while? Yes,
>> that
>> looks different [2], seems like the SRM test is the problem.
>>
>> And digging down it looks like it's the GetTURLs test that failing and
>> the
>> Lancaster nagios has the same bug that cause problems in May? on the
>> Oxford
>> nagios with it sending illegal data in the request [3]
>>
>> I'll submit another ticket.
>>
>> Kashif, can you remember what the fix on the nagios server was and get
>> it
>> applied to the Lancaster on?
>>
>> Yours,
>> Chris.
>>
>> [1]
>> https://gridppnagios.physics.ox.ac.uk/myegi/history/?facelist_values_re
>> gions
>> =&facelist_values_Sites=386%2C&facelist_values_services=&vo=37&profile=
>> 5&mon
>> itored=2&status=1&status=2&status=3&status=4&status=5&iDisplayLength=10
>> &star
>> tdate=01-08-2012&enddate=31-08-2012 ( http://goo.gl/IHrPt )
>>
>> [2]
>> https://gridppnagios.lancs.ac.uk/myegi/history/?facelist_values_regions
>> =&fac
>> elist_values_Sites=94%2C&facelist_values_services=&vo=222&profile=16&mo
>> nitor
>> ed=2&status=1&status=2&status=3&status=4&status=5&iDisplayLength=10&sta
>> rtdat
>> e=01-08-2012&enddate=31-08-2012 ( http://goo.gl/4nCs1 )
>>
>> [3]
>> https://gridppnagios.lancs.ac.uk/myegi/status/222/2109/44/1346324377/
>> (http://goo.gl/WOZOw )
>>
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>> [log in to unmask]] On Behalf Of Jeremy Coles
>>> Sent: 04 September 2012 16:35
>>> To: [log in to unmask]
>>> Subject: Fwd: T2 Availability & Reliability - August 2012
>>>
>>> Dear All
>>>
>>> A few more sites requiring follow-up for the August
>>> availability/reliability but overall the Tier-2s are fine. Please
>> take
>>> a look and let me know of any concerns, we'll review these next
>>> Tuesday.
>>>
>>> Many thanks,
>>> Jeremy
>>>
>>>
>>>
>>> Begin forwarded message:
>>>
>>>
>>> From: WLCG Office <[log in to unmask]>
>>>
>>> Subject: T2 Availability & Reliability - August 2012
>>>
>>> Date: 4 September 2012 16:20:46 GMT+01:00
>>>
>>> To: "project-wlcg-cb (Members of the WLCG CB)" <project-wlcg-
>>> [log in to unmask]>
>>>
>>> Cc: "project-lcg-gdb (LCG - Grid Deployment Board)" <project-lcg-
>>> [log in to unmask]>, "[log in to unmask]" <[log in to unmask]>,
>>> "[log in to unmask]" <[log in to unmask]>, "sam-support (SAM
>>> support)" <[log in to unmask]>
>>>
>>>
>>>
>>> Dear all,
>>>
>>> Please find below the draft T2 Reliability & Availability report
>>> for August 2012:
>>>
>>> http://sam-reports.web.cern.ch/sam-
>>> reports/2012/201208/wlcg/WLCG_Tier2_Aug2012.pdf
>>>
>>> Please verify your data and send comments to [log in to unmask]
>>> by Friday 14 September.
>>>
>>> Requests for re-computations are to be entered via GGUS within 10
>>> calendar days of this e-mail being sent. Full details are here:
>>> https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-
>>> computation+Policy
>>>
>>> The final T2 reports are stored in the WLCG document repository
>>> under http://cern.ch/wlcg-docs/ReliabilityAvailability/Tier-2 and
>>> reported to the Overview Board.
>>>
>>> Kind regards,
>>> Cath
>>>
>>>
>>>
>>> -----------------------------------------------
>>> WLCG Office
>>> IT Dept - CERN
>>> CH-1211 Genève, Switzerland
>>> www.cern.ch/wlcg
>>>
>>>
>
--
Cheers,
Andrew
--------------------------------------------------------------
Dr Andrew McNab, High Energy Physics, University of Manchester
www.hep.manchester.ac.uk/u/mcnab Skype/GTalk: andrew.mcnab.uk
|