It seems that one of the message brokers was temporarily unavailable. I haven't seen any unscheduled downtime for broker but a network issue was reported for the site which is hosting one of the brokers. I can see that most of the CEs have recovered.
Cheers
Kashif
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Winnie Lacesso
> Sent: 16 February 2015 13:44
> To: [log in to unmask]
> Subject: Again: "Upstream" problem for UK CREAM-CE??
>
> for many UK CREAM-CE, opssgm nagios tests are CRITICAL last hour or so
> with error CRITICAL: Job was aborted
>
> On Bristol CREAM-CEs the errors are identical
>
> 1358726.lcgce04. opssgm short cream_425280597 7717 1 1 -- 00:20
> R 00:00 sm23
> qstat -f 1358726 | egrep cput|wallt
> resources_used.cput = 00:00:00
> resources_used.walltime = 00:16:48
>
> The jobs are doing 0 cpu & hitting the 30min walltime.
>
> Exactly as before,
> > root@bse09> tail gridjob.out
> > === WN: bse09.phy.bris.ac.uk
> > === WN arch: x86_64
> > Check Python version:
> > /usr/bin/python
> > Python 2.6.6
> > Can we import Python LDAP ...
> > YES.
> > Launching MTA.
> >
> /home/opssgm/home_cream_364446563/CREAM364446563/nagios/bin/mta
> -simple
> > --dirq /tmp/sam.15538.7233/msg-outgoing --destination
> > /queue/grid.probe.metricOutput.EGEE.gridppnagios_physics_ox_ac_uk
> > --broker-network PROD --pidfiledir
> > /home/opssgm/home_cream_364446563/CREAM364446563/nagios/var/ -v
> info
> > --bdii-uri
> > lcgbdii.gridpp.rl.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170,top-bdi
> > i.tier2.hep.manchester.ac.uk:2170 No handlers could be found for
> > logger "stomp.py"
>
> Looks same for QMUL, Ox, Sc Ox, Lancs, Imperial....?
|