Dear all,
To prevent a further explosion on the workload, we have added to the
generated job script an explicit chdir to a temporary local
scratch directory, whenever the directory as specified in the RSL
refers to $SCRATCH_DIRECTORY /AND/ the RSL job type is "single".
In this way, we (at NIKHEF) now override the default working directory
for jobs and have it honour POSIX 1003.2d. $TMPDIR is local to each
worker node.
This job class is used by the majority of all LHC jobs, and as far
as we can see these continue to behave normally (at least LHCb, of which
we have a lot now :-)
All other job types, including regular "multiple" jobs and MPI, are
not affected by this change; also jobs that specify their directory
explicitly remain unaltered and will run from the specified location.
The following was added to "pbs.pm" for the non-mpi,non-multiple jobs:
# this is a simple single-node job that can use $TMPDIR
# unless the user has given one explicitly
# refer back to JobManager.pm, but currently it seems that
# $self->make_scratchdir uses "gram_scratch_" as a component
if ( $description->directory() =~ /.*gram_scratch_.*/ ) {
$pbs_job_script->print('[ x"$TMPDIR" != x"" ] && cd $TMPDIR'."\n");
}
Note that the regular directories for stdout/stderr are retained and
written to the patch as specified in the PBS directives (usually the
GASS cache area).
For as of yet unknown reasons, streaming of stdout/stderr does not work.
(it does on all other plain-globus clusters). Any ideas are welcome.
Note that these changes should be transparent to our MPI and "multiple"
users.
Cheers,
DavidG.
--
David Groep
** National Institute for Nuclear and High Energy Physics, PDP/Grid group **
** Room: H1.56 Phone: +31 20 5922179, PObox 41882, NL-1009DB Amsterdam NL **
|