Hello all,
I have a problem with our new CE from glite 3.1
We use PBSPro with a patch that leaves a job in 'C' state after
the job is finished. So we have had to patch lcgpbs.pm to accept
this state. It worked well for lcg-CE 3.0.
I am trying the same approach on lcg-CE 3.1. When I first submit a job
via globus-job-run it finishes ok. But when I submit via
glite-wms-job-submit
the job is stuck in Running state.
I have put some debugging prints into the jobmanager and restarted
globus-gass-cache-marshal and globus-job-manager-marshal. Now when I try
to submit jobs via globus-job-run again
the job gets stuck because the jobmanager cannot renew
~dteam001/.lcgjm/pbsqueue.cache file.
(I am mapped to dteam001).
This is because in ~dteam001/.lcgjm there is already a file
pbsqueue.cache.proc.locked
containing this:
1039803243:.lcgjm/pbsqueue.cache.proc.hold.ce2.8559
and
$ ps -ef | grep 8559
dteam001 8559 1 0 Jul17 ? 00:00:00 perl
/tmp/grid_manager_monitor_agent.dteam001.30922.1000 --debug=5
--maxtime=3600s
I have tried to go through
/tmp/grid_manager_monitor_agent.dteam001.30922.1000 but I don't
see a place where it would acquire and hold the lock file.
I would appreciate any hints, thank you.
--
Tomas Kouba
Institute of Physics, Academy of sciences of the Czech Republic
|