On 02/16/2010 04:27 PM, Arnau Bria wrote:
> some examples:
>
> Running job:
> **I'd like this ones to run in torque's tmp dir.
>
>
> [root@td173 cmprd008]# ls home_cream_649483053
> CREAM649483053 CREAM649483053_jobWrapper.sh cream_649483053.proxy StandardError StandardOutput
> [root@td173 cmprd008]# du -sh home_cream_649483053
> 1.3G home_cream_649483053
>
For the running jobs, could you have a look at the stagein directive?
Could you compare it with non-CREAM jobs?
If you feel like experimenting, you can try to tweak this line in
pbs_submit.sh:
bls_fl_subst_and_accumulate inputsand "@@F_REMOTE@`hostname
-f`:@@F_LOCAL" ","
The second parameter is the template for the stagein directive.
@@F_REMOTE is replaced by the name of the file staged on the execution node.
By default, the path is relative to the home directory of the user, but
you can make it an absolute path by modifying the template.
The qsub man page says:
"If TORQUE has been compiled with wordexp support, then variables can be
used in the specified paths. Currently only $PBS_JOBID, $HOME, and
$TMPDIR are supported for stagein."
I'll make a few tests and come back with more info asap.
If this solve the issue, we can make that template configurable so that
you don't need to hack the submission script anymore...
> non-runnig jobs:
>
> this exited with 271. cpu_time exceed.
>
> [root@td173 cmprd008]# ls home_cream_830522492
> CREAM830522492_jobWrapper.sh cream_830522492.proxy
> [root@td173 cmprd008]# du -sh home_cream_830522492
> 44K home_cream_830522492
>
Those files should be removed by torque. The qsub man page states:
"On completion of the job, all staged-in and staged-out files are
removed from the execution system."
It doesn't say "On *successful* completion"...
Obviously, any cleanup mechanism inside the job itself would not work in
this case (as the job is killed). If the qsub doesn't do what what we
expect it to do, you have to rely on the periodic cleanup tool...
Cheers,
David
|