Hi Stijn,
Using the fact that TMPDIR points to a local (per-node) scratch directory,
we have been applying a patch to the original pbs manager to select,
based on the #nodfes requested by the job and the job type, the
current working directory of the job. Patch is attached to this mail
(it's just a few lines).
What it does:
* if the job is of type "mpi", or if the type is "multiple" and the
number of requested nodes > 1, the bevahviour of the pbs job manager
is un-altered.
* if the job type is "single", or the type is "multiple" and the
job requests 0 or 1 nodes, the following statement is inserted
in the PBS job script, just before the user job is started:
[ x"$TMPDIR" != x"" ] && cd $TMPDIR
This patch is applied to the template for the pbs.pm job manager script
in /opt/globus/setup/globus/pbs.in, which then gets translated on
startup in /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
It has till now worked fine for all LCG jobs that at NIKHEF also go
through the "old" pbs JM. The jobs don't notice the difference and
we can use shared home dirs for all VOs, provided we also have a
per-node $TMPDIR location on local disk.
Cheers,
DavidG.
Stijn De Weirdt wrote:
> i am looking for a way to set the SCRATCH_DIRECTORY value for jobs
> running on our cluster based on the vo.
> (we have pbs queues with nfs mounted home directories for mpi, but we
> also have jobs that might perform better when they can write directly to
> local disks.)
>
> there's an option for the globus-job-manager called -scratch-dir-base,
> but i don't know if that's the good way to change this. (and i also
> don't know how to make it vo dependent)
>
> or if someone has succesfully mixed pbs (nfs) and lcgpbs (local disk)
> queues on their site, that could also be a solution.
>
> many thanks
>
> stijn
The patch
*** pbs.in.orig 2005-05-20 12:56:32.000000000 +0200
--- pbs.in 2005-05-20 12:52:05.000000000 +0200
***************
*** 321,327 ****
}
print JOB "wait\n";
}
! elsif($description->jobtype() eq 'multiple')
{
my $count = $description->count;
my $cmd_script_url ;
--- 321,327 ----
}
print JOB "wait\n";
}
! elsif( ($description->jobtype() eq 'multiple') and
($description->count > 1 ) )
{
my $count = $description->count;
my $cmd_script_url ;
***************
*** 374,379 ****
--- 374,393 ----
}
else
{
+ # this is a simple single-node job that can use $TMPDIR
+ # unless the user has given one explicitly
+ # refer back to JobManager.pm, but currently it seems that
+ # $self->make_scratchdir uses "gram_scratch_" as a component
+ if ( ( $description->directory() =~ /.*gram_scratch_.*/ ) and
+ ( $description->host_count() <= 1 ) and
+ ( $description->count <= 1 )
+ ) {
+ print JOB '# user ended in a scratch directory, reset to
TMPDIR'."\n";
+ print JOB '[ x"$TMPDIR" != x"" ] && cd $TMPDIR'."\n";
+ } else {
+ print JOB '# user requested this specific directory'."\n";
+ }
+
print JOB $description->executable(), " $args <",
$description->stdin(), "\n";
}
--
David Groep
** National Institute for Nuclear and High Energy Physics, PDP/Grid group **
** Room: H1.56 Phone: +31 20 5922179, PObox 41882, NL-1009DB Amsterdam NL **
|