Print

Print


hi all,

i have a problem with jobs staying in queue. our site runs LCG260 with
torque/maui.

on the ce, the job shows running but when i login on the node, i get the
following:
ps ax --forest
1783 ?        S     33:24 /usr/sbin/pbs_mom -p
12458 ?        S      0:00  \_ -sh
12591 ?        S      0:00      \_ /bin/sh
/var/spool/pbs/mom_priv/jobs/32448.gridc.SC
12592 ?        S      0:00          \_ bash
/home/becms29/.globus/.gass_cache/local/md5
20675 ?        S      0:00              \_ bash
/home/becms29/.globus/.gass_cache/local
20676 ?        S      0:00                  \_ perl -e ?use strict;?use
Fcntl ":mode";?

the data script (called by process 20675) is attached. the perl script
it hangs on is run at line 495.
the output of the ${maradona} file is
job exit status = 137

i would like to know what is going wrong here (ie where the perl script
hangs on. i bet on the sleep($wait_time). the job has been like this for
12h+, giving it already 6 retries. where does the output of the
log_something ends up anyway?) or what can be done about it. if you need
more info, let me know.


stijn