Dear *,
Is there some "upstream" problem for ops/UKI?
Bristol's 2 CREAM-CEs have been red for 17hrs now failing 2 tests. In
looking at https://mon.egi.eu/myegi/ for our & other UKI sites, the
CREAM-CEs are all similar - red last 7 to 17 hrs.
The 2 errors seem to be same for each site checked in nagios, except most
sites are only 7-13 hrs red not 17 like Bristol's (did I do any change on a
Friday afternoon?!?! Noooo!)
emi.cream.CREAMCE-JobSubmit-/ops/Role=lcgadmin
CRITICAL 07-12-2014 12:39:10 0d 7h 1m 5s 2/2 CRITICAL: [3W/2] [Running->Cancelled [timeout/dropped]]
emi.cream.glexec.CREAMCE-JobSubmit-/ops/Role=pilot
CRITICAL 07-12-2014 10:32:03 0d 9h 8m 46s 2/2 CRITICAL: [3W/2] [Running->Cancelled [timeout/dropped]]
It looks like opssgm jobs hit the 30-min short/express walltime queue
timeout & fail, causing a backlog of short/express jobs (big queue of
them).
Tracing job on WN,
/home/opssgm/home_cream_134122892/CREAM134122892/gridjob.out ends at
Python 2.6.6
Can we import Python LDAP ...
YES.
Launching MTA.
/home/opssgm/home_cream_134122892/CREAM134122892/nagios/bin/mta-simple --dirq /tmp/sam.16938.24688/msg-outgoing --destination /queue/grid.probe.metricOutput.EGEE.gridppnagios_lancs_ac_uk --broker-network PROD --pidfiledir /home/opssgm/home_cream_134122892/CREAM134122892/nagios/var/ -v info --bdii-uri lcgbdii.gridpp.rl.ac.uk:2170,topbdii.grid.hep.ph.ic.ac.uk:2170,top-bdii.tier2.hep.manchester.ac.uk:2170
No handlers could be found for logger "stomp.py"
Anyone know what No handlers could be found for logger "stomp.py"
means?
Process tree:
root@sm23> pstree -lp 16803
bash(16803)---1125180.lcgce04(16818)---CREAM134122892_(16823)---perl(16934)-+-perl(16936)
`-sh(16935)---nagrun.sh(16938)---python(16961)
root@sm23> strace -p 16961
Process 16961 attached - interrupt to quit
connect(4, {sa_family=AF_INET, sin_port=htons(6163), sin_addr=inet_addr("195.251.55.91")}, 16^C <unfinished ...>
root@bse11> nslookup 195.251.55.91
91.0/25.55.251.195.in-addr.arpa name = mq.afroditi.hellasgrid.gr.
It's the same on both clusters (Xeon & AMD)
Is there an upstream problem with hellasgrid.gr?
(for all of UK)?
Winnie Lacesso / Bristol University Particle Physics Computing Systems
HH Wills Physics Laboratory, Tyndall Avenue, Bristol, BS8 1TL, UK
|