Print

Print


I can't speak to the UK, but ...

When it comes to monitoring, all I want is:
a) something that emails me automatically when something goes wrong
and
b) that has a link for further information in it.

Basically nagios.

Don't make me check a webpage, it never ever works and I am speaking from
dire experience here.
And don't include a generic link either where I then have to guess which of
the n settings I have to check/change to figure out where the error comes
from.

CMS is a guilty of that as Atlas.

Try running tests on a site that is not a member of the experiment (i.e. a
T3) and see if this site can understand the error and you'll do just fine.

Bonus points for a site being able to initiate a test (to check something
has been fixed), but that's really a bonus.

Cheers,
Daniela





On 17 September 2013 14:01, Alessandra Forti <[log in to unmask]>wrote:

>  I sent this to Jeremy thinking he would put it in agenda but he told me
> he wasn't there eirther.
>
>
> -------- Original Message --------  Subject: Re: Ops meeting @ 11am  Date:
> Tue, 17 Sep 2013 10:01:05 +0100  From: Alessandra Forti
> <[log in to unmask]> <[log in to unmask]>  CC: Jeremy Coles
> <[log in to unmask]> <[log in to unmask]>
>
> Hi Jeremy,
>
> as there is the engineer to repair the central switch this morning I
> don't know if I can make it to the meeting or if I can be reliably there.
>
> SL6:
>
> * Bristol postponed
> * Glasgow and Lancaster are now in test with atlas queues
> * Manchester has brought forward the upgrade 2 weeks and we have
> declared a week downtime from the 30th of September untill the 7th of
> October.
> * Birmingham is done.
>
> * There are problems with the java voms-proxy-info again affecting atlas
> jobs on sites that limit the memory to 3GB (few UK sites are doing
> that). Atlas is thinking of replacing voms-proxy-info with arcproxy. I'm
> giving a talk at the ADC meeting later today to decide what to do.
> https://ggus.eu/ws/ticket_info.php?ticket=97230
>
> Monitoring:
>
> I started a discussion about nagios on the sites monitoring
> consolidation list. Only Jeff Templon replied. We need a UK point of
> view. If sites show no interest I don't blame the monitoring people for
> going their way. If we don't speak they are right to take this decisions
> almost without consultation.
>
> cheers
> alessandra
>
>
>
>
>
> On 17/09/2013 09:38, Jeremy Coles wrote:
> > Dear All
> >
> > The agenda for today's ops meeting is available at http://indico.cern.ch/conferenceDisplay.py?confId=273350. The plan is to review the GDB updates from last week and check again on the SL6 status (especially to bring out any issues or concerns).
> >
> > Pete has kindly agreed to chair this week - though if Pete is unable to connect from RAL, please could someone else from the core ops team take control. As Matt mentioned in the tickets email, there will not be an ops meeting next week due to GridPP31 (https://www.gridpp.ac.uk/gridpp31/).
> >
> > For minutes the list is Mark=6 Wahid=8 Daniela=7 Kashif=7 Matt=7 Chris=7 Alessandra=7 Pete=7 Rob=7 Ewan=7 Brian=7.
> >
> > regards,
> > Jeremy
>
>
> --
> Facts aren't facts if they come from the wrong people. (Paul Krugman)
>
>
>
>
>


-- 
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/