hi all,
i have a problem with jobs staying in queue. our site runs LCG260 with
torque/maui.
on the ce, the job shows running but when i login on the node, i get the
following:
ps ax --forest
1783 ? S 33:24 /usr/sbin/pbs_mom -p
12458 ? S 0:00 \_ -sh
12591 ? S 0:00 \_ /bin/sh
/var/spool/pbs/mom_priv/jobs/32448.gridc.SC
12592 ? S 0:00 \_ bash
/home/becms29/.globus/.gass_cache/local/md5
20675 ? S 0:00 \_ bash
/home/becms29/.globus/.gass_cache/local
20676 ? S 0:00 \_ perl -e ?use strict;?use
Fcntl ":mode";?
the data script (called by process 20675) is attached. the perl script
it hangs on is run at line 495.
the output of the ${maradona} file is
job exit status = 137
i would like to know what is going wrong here (ie where the perl script
hangs on. i bet on the sleep($wait_time). the job has been like this for
12h+, giving it already 6 retries. where does the output of the
log_something ends up anyway?) or what can be done about it. if you need
more info, let me know.
stijn
|