Hi *,
I still try to track down the problems with our new EMI-2 CREAM-CE.
Meanwhile I am pretty sure that:
- passwordless ssh between WNs and CE/SGE works
- payload is correctly executed
But I assume that jobs are actively being killed by some blah component.
One hint I have:
2012-10-25 08:47:07 +-+line 520,command_string:/sge-root/bin/lx24-amd64//qacct -j 1331321
2012-10-25 08:47:09 +-+line 520,command_string:/sge-root/bin/lx24-amd64//qacct -j 1331322
2012-10-25 08:47:11 +-+query_err:1331321 1331322
(for the 2nd query he "gave up" after 2 seconds)
[root@cream-ce cream]# time /sge-root/bin/lx24-amd64//qacct -j 133132
==============================================================
qname atlasanl.q
hostname wn140.pleiades.uni-wuppertal.de
group atlasplt
owner pltatlas011
project NONE
department defaultdepartment
jobname cream_247776407
jobnumber 133132
taskid undefined
account sge
priority 2
qsub_time Tue May 29 17:50:03 2012
start_time Tue May 29 22:04:36 2012
end_time Tue May 29 22:37:03 2012
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 1947
ru_utime 1254.514
ru_stime 37.544
ru_maxrss 1073708
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 3796496
ru_majflt 2
ru_nswap 0
ru_inblock 0
ru_oublock 0
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 470547
ru_nivcsw 246664
cpu 1292.059
mem 484.485
io 54.143
iow 0.000
maxvmem 1.884G
arid undefined
real 0m2.925s
user 0m0.727s
sys 0m1.099s
Could it be that BUpdaterSGE is expecting the output too fast? Or is it a parsing error? I put
#Updater debug level
bupdater_debug_level=99
so I expect I cannot get a more detailed output.
Cheers
Torsten
--
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> <>
<> Dr. Torsten Harenberg [log in to unmask] <>
<> Bergische Universitaet <>
<> FB C - Physik Tel.: +49 (0)202 439-3521 <>
<> Gaussstr. 20 Fax : +49 (0)202 439-2811 <>
<> 42097 Wuppertal <>
<> <>
<><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>
|