Hi,
Sorry for the late reply. First, thanks for the new torque rpms, I'll
copy them over to the server with the other MPI-related rpms.
>>The point is, I am not sure on exactly *who* is responsible
>>for calling mpirun, in order to spawn the processes of the
>>parallel application.
>
> Neither am I. I thought the pbs.pm script was doing this, but changing the mpirun definition for Globus did not help. The mpirun command is in fact put in somewhere else. Does anyone know where?
This is actually a somewhat complicated point. The globus (and EDG)
gatekeepers actually define several different jobs types: simple job,
multiple job, and mpi jobs.
The "mpi" job in the gatekeeper actually calls mpirun directly from the
job wrapper generated by pbs.pm. This means that the user must pass a
pre-compiled executable as the mpirun is done outside of the direct
control of the user.
In EDG we found this extremely limiting because users often want to do
some setup before calling mpirun. (E.g. compiling the executable itself
as the mpich libraries are often different/incompatible on different
sites).
Because of this "mpi" jobs submitted through the LCG workload management
system don't use the globus "mpi" job type, but use instead the
"multiple" job type. This means that the user must do some setup
herself and must call mpirun manually.
Cheers.
Cal
|