Happy Friday!
As before, for many UK CREAM-CE, opssgm nagios tests are CRITICAL last
hour or so with error CRITICAL: Job was aborted
On Bristol CREAM-CEs (their WN actually) the errors are identical
qstat -f 1914908 | egrep cput|wallt
resources_used.cput = 00:00:00
resources_used.walltime = 00:14:22
The jobs are doing 0 cpu & hitting the 30min walltime.
root@sm02> cat gridjob.out
Launched with parameters: -v ops -f /ops/Role=lcgadmin -t 600 -w 1 -d
/queue/grid.probe.metricOutput.EGEE.gridppnagios_physics_ox_ac_uk -n PROD
=== [Fri Jan 22 15:30:14 GMT 2016] ===
=== Running on ===
=== Site: UKI-SOUTHGRID-BRIS-HEP
=== CE: lcgce04.phy.bris.ac.uk:8443/cream-pbs-express
=== WN: sm02.hadoop.cluster
=== WN arch: x86_64.
=== [Fri Jan 22 15:30:14 GMT 2016] ===
=== Running on ===
=== Site: UKI-SOUTHGRID-BRIS-HEP
=== CE: lcgce04.phy.bris.ac.uk:8443/cream-pbs-express
=== WN: sm02.hadoop.cluster
=== WN arch: x86_64
Check Python version:
/usr/bin/python
Python 2.6.6
Can we import Python LDAP ...
YES.
Launching MTA.
/home/opssgm/home_cream_831544264/CREAM831544264/nagios/bin/mta-simple
--dirq /tmp/sam.9882.12776/msg-outgoing --destination
/queue/grid.probe.metricOutput.EGEE.gridppnagios_physics_ox_ac_uk
--broker-network PROD --pidfiledir
/home/opssgm/home_cream_831544264/CREAM831544264/nagios/var/ -v info
--bdii-uri lcgbdii.gridpp.rl.ac.uk:2170
No handlers could be found for logger "stomp.py"
Could someone go stomp on the snoozing handlers, please, & wake them up..?
|