>>>>> Chris Brew writes:
> I've not worked out the precise cause but running yaim on a worker
> node does a stop and start of the pbs_mom service which seems to
> delete jobs on occasion.
> If anyone has any idea what's going on and how to stop it I'm all
> ears.
If you want to avoid the pbs_mom restart from killing jobs, it should be
sufficient to export `previous' to the environment when running yaim.
This will result in pbs_mom restarting with the `-p' option:
-p Specifies the impact on jobs which were in execution when the
mini-server shut down. On any restart of MOM, the new
mini-server will not be the parent of any running jobs, MOM
has lost control of her offspring (not a new situation for a
mother). With the -p option, Mom will allow the jobs to
continue to run and monitor them indirectly via polling. The
-p option is mutually exclusive with the -r option.
At least, that's what we use at the RAL Tier-1 to avoid pbs_mom restarts
triggered by Quattor from deleting jobs.
Matt
|