Print

Print


In particular one of the reasons nagios was chosen was because it is 
possible to import tests results from the central nagios boxes into the 
local one and tailor the alarms according to local taste.

I'd like to know how many sites are doing this. I remember RalPP being 
one of the biggest supporter of this.


On 19/09/2013 12:36, Alessandra Forti wrote:
> Hi Daniela,
>
> thanks for the feedback. Anybody else has an opinion on this?
>
> cheers
> alessandra
>
> On 17/09/2013 14:17, Daniela Bauer wrote:
>> I can't speak to the UK, but ...
>>
>> When it comes to monitoring, all I want is:
>> a) something that emails me automatically when something goes wrong
>> and
>> b) that has a link for further information in it.
>>
>> Basically nagios.
>>
>> Don't make me check a webpage, it never ever works and I am speaking 
>> from dire experience here.
>> And don't include a generic link either where I then have to guess 
>> which of the n settings I have to check/change to figure out where 
>> the error comes from.
>>
>> CMS is a guilty of that as Atlas.
>>
>> Try running tests on a site that is not a member of the experiment 
>> (i.e. a T3) and see if this site can understand the error and you'll 
>> do just fine.
>>
>> Bonus points for a site being able to initiate a test (to check 
>> something has been fixed), but that's really a bonus.
>>
>> Cheers,
>> Daniela
>>
>>
>>
>>
>>
>> On 17 September 2013 14:01, Alessandra Forti 
>> <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>>
>>     I sent this to Jeremy thinking he would put it in agenda but he
>>     told me he wasn't there eirther.
>>
>>
>>     -------- Original Message --------
>>     Subject: 	Re: Ops meeting @ 11am
>>     Date: 	Tue, 17 Sep 2013 10:01:05 +0100
>>     From: 	Alessandra Forti <[log in to unmask]>
>>     <mailto:[log in to unmask]>
>>     CC: 	Jeremy Coles <[log in to unmask]>
>>     <mailto:[log in to unmask]>
>>
>>
>>
>>     Hi Jeremy,
>>
>>     as there is the engineer to repair the central switch this morning I
>>     don't know if I can make it to the meeting or if I can be reliably there.
>>
>>     SL6:
>>
>>     * Bristol postponed
>>     * Glasgow and Lancaster are now in test with atlas queues
>>     * Manchester has brought forward the upgrade 2 weeks and we have
>>     declared a week downtime from the 30th of September untill the 7th of
>>     October.
>>     * Birmingham is done.
>>
>>     * There are problems with the java voms-proxy-info again affecting atlas
>>     jobs on sites that limit the memory to 3GB (few UK sites are doing
>>     that). Atlas is thinking of replacing voms-proxy-info with arcproxy. I'm
>>     giving a talk at the ADC meeting later today to decide what to do.
>>
>>     https://ggus.eu/ws/ticket_info.php?ticket=97230
>>
>>     Monitoring:
>>
>>     I started a discussion about nagios on the sites monitoring
>>     consolidation list. Only Jeff Templon replied. We need a UK point of
>>     view. If sites show no interest I don't blame the monitoring people for
>>     going their way. If we don't speak they are right to take this decisions
>>     almost without consultation.
>>
>>     cheers
>>     alessandra
>>
>>
>>
>>
>>
>>     On 17/09/2013 09:38, Jeremy Coles wrote:
>>     > Dear All
>>     >
>>     > The agenda for today's ops meeting is available athttp://indico.cern.ch/conferenceDisplay.py?confId=273350. The plan is to review the GDB updates from last week and check again on the SL6 status (especially to bring out any issues or concerns).
>>     >
>>     > Pete has kindly agreed to chair this week - though if Pete is unable to connect from RAL, please could someone else from the core ops team take control. As Matt mentioned in the tickets email, there will not be an ops meeting next week due to GridPP31 (https://www.gridpp.ac.uk/gridpp31/).
>>     >
>>     > For minutes the list is Mark=6 Wahid=8 Daniela=7 Kashif=7 Matt=7 Chris=7 Alessandra=7 Pete=7 Rob=7 Ewan=7 Brian=7.
>>     >
>>     > regards,
>>     > Jeremy
>>
>>
>>     -- 
>>     Facts aren't facts if they come from the wrong people. (Paul Krugman)
>>
>>
>>
>>
>>
>>
>> -- 
>> Sent from the pit of despair
>>
>> -----------------------------------------------------------
>> [log in to unmask] <mailto:[log in to unmask]>
>> HEP Group/Physics Dep
>> Imperial College
>> London, SW7 2BW
>> Tel: +44-(0)20-75947810
>> http://www.hep.ph.ic.ac.uk/~dbauer/ 
>> <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
>
>
> -- 
> Facts aren't facts if they come from the wrong people. (Paul Krugman)


-- 
Facts aren't facts if they come from the wrong people. (Paul Krugman)