for many UK CREAM-CE, opssgm nagios tests are CRITICAL last hour or so
with error CRITICAL: Job was aborted
On Bristol CREAM-CEs the errors are identical
1358726.lcgce04. opssgm short cream_425280597 7717 1 1 -- 00:20 R 00:00 sm23
qstat -f 1358726 | egrep cput|wallt
resources_used.cput = 00:00:00
resources_used.walltime = 00:16:48
The jobs are doing 0 cpu & hitting the 30min walltime.
Exactly as before,
> root@bse09> tail gridjob.out
> === WN: bse09.phy.bris.ac.uk
> === WN arch: x86_64
> Check Python version:
> /usr/bin/python
> Python 2.6.6
> Can we import Python LDAP ...
> YES.
> Launching MTA.
> /home/opssgm/home_cream_364446563/CREAM364446563/nagios/bin/mta-simple --dirq /tmp/sam.15538.7233/msg-outgoing --destination /queue/grid.probe.metricOutput.EGEE.gridppnagios_physics_ox_ac_uk --broker-network PROD --pidfiledir /home/opssgm/home_cream_364446563/CREAM364446563/nagios/var/ -v info --bdii-uri lcgbdii.gridpp.rl.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170,top-bdii.tier2.hep.manchester.ac.uk:2170
> No handlers could be found for logger "stomp.py"
Looks same for QMUL, Ox, Sc Ox, Lancs, Imperial....?
|