Hola Pablo,
> I found the place where the ssh is done, it's in
> /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
>
> There's a $remote_shell = '/usr/bin/ssh'; and later on:
>
> hosts=\`cat \$PBS_NODEFILE\`;
> counter=0
> while test \$counter -lt $count; do
> for host in \$hosts; do
> if test \$counter -lt $count; then
> $remote_shell \$host "/bin/sh $cmd_script_name; echo \\\$? >
> $exit_prefix.\$counter" < $stdin &
> counter=\`expr \$counter + 1\`
> else
> break
> fi
> done
> done
>
>
> This code is inside some chains of 'if's, let me try to display them:
>
> if($description->jobtype() eq 'multiple' && !$cluster)
> {
> [...]
> }
> elsif($description->jobtype() eq 'mpi' ||
> $description->jobtype() eq 'multiple')
> {
> if ($description->jobtype() eq "mpi")
> {
> [...]
> }
> else
> {
> HERE IS THE SSH CODE
Which should not get executed for simple jobs, but see below!
> }
> }
> else
> {
> print JOB $description->executable(), " $args <",
> $description->stdin(), "\n";
> }
> close(JOB);
>
> The last line is what WMS jobs do. So, it looks like the lcg-ce wrapper thinks
> our test jobs are 'multiple' !! It must be set as default somewhere, but
> where? I can't find it.
>
> The other possible jobtype value (besides mpi and multiple) is 'single'. Maybe
> you need to specify that somewhere instead?
It should be the default, but it turns out that globus-job-run submits the jobs
with these items in the RSL:
("count" = "1" )("job_type" = "multiple")
So, you would need to change the condition to read e.g. as follows:
elsif($description->jobtype() eq 'mpi' || $description->count > 1)
You would need to apply that change in 2 files:
/opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
/opt/globus/setup/globus/pbs.in
And then:
/etc/init.d/globus-job-manager-marshal restart
/etc/init.d/globus-gma restart
I will open a bug about it, but it is doubtful that an official fix will still
be released for that...
|