I have noticed these issues in fsl_sub calling qsub -W on redhat torque-4.2.6.1 with the moab scheduler:
(1) if a first job completes and exits the grid before its dependencies have all been queued, there is a halt with a job dependency error
(2) the dependence on an entire array is interpreted as dependence on the last element only, causing an error in later steps that expect input from all sub-tasks
As a temporary work-around, I have intercepted qsub -W arguments, adding -h to hold all jobs and including each array element as a dependence. When feat exits, I then call qrls to release the held jobs. Are there any other suggestions for dealing with these problems?
If you can send me a copy of your new torque version of fsl_sub, I would like to try it.
Thanks,
Kathy Pearson
----------------------------------------------------
Hi FSL-ers,
I just want to let you know that I am working on a version of fsl_sub that works for the scheduler PBS Pro (PBS Professional). Basically, I tweaked the fsl_sub script for Torque provided by Matt Glasser.
The most important modification was to replace the exec statement for array jobs with the eval statement. Otherwise array jobs with more than one command on one line (separated by semicolons, see e.g. the fslvbm2a script produced by fslvbm_2_template) were not run successfully.
I am currently testing this version of fsl_sub. If someone is interested in the script, please send me a mail so that I can give you the most recent version.
Best,
Henk van Steenbergen
|