Hi,
Several of our users are interested in running MPI jobs on the grid,
but none
have managed to get a job to run. It seems that only three sites that
support our
VO (fusion) also support mpich jobs. If I submit to ce1.egee.fr.cgg.com
or to
our own grid002.jet.efda.org, I get the following error.
Cannot plan: JobAdapterHelper: invalid value torque for attribute
lrms_type (expecting lsf or pbs)
Does this imply we can't do MPI if we run torque ??
If I submit to cluster.pnpi.nw.ru, I get
Job proxy is expired
even though it is clear that my proxy has not expired.
Is there something obvious we are missing? An example bit of jdl is
shown below.
It is an example pulled off the web.
Type = "job";
JobType = "mpich";
NodeNumber = 2;
InputSandbox = "cpi.sh";
Executable = "cpi.sh";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
VirtualOrganisation="fusion";
RetryCount=7;
I (rather naively) thought our site (EFDA-JET - grid002.jet.efda.org)
would support MPICH if we
a) Had mpich installed
b) Had a comon home directory across WNs
c) had MPICH in our CE_RUNTIMEENV variable.
This is clearly not the case. Should we be supportting lam
or openmpi or shouldn't we be contemplating running parallel grid
jobs at all?
Thanks in advance
Dave
|