Hello all,
I have been experimenting with support for MPICH type jobs under LCG
2.3.0. We have been publishing the MPICH tag for HG-01-GRNET, and
job-list-match lists our CE in the candidates for execution, when a JDL
with JobType="MPICH" is provided.
However, job submission fails with:
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://lxn1188.cern.ch:9000/3XP1kH4KzLDepHEbkgWhxg
Current Status: Aborted
Status Reason: Cannot plan: JobAdapterHelper: invalid value torque for attribute lrms_type (expecting lsf or pbs)
reached on: Tue Jan 18 13:56:47 2005
*************************************************************
which seems to be a result of LCG 2.3.0 using Torque instead of PBS.
The error message stays the same, when trying to execute the same job on
other CEs advertising MPICH execution capability (by specifying them
explicitly in the JDL).
Also, trying to compare the available options for integrating MPICH
support with PBS/Torque, I came across the following link:
http://www.beowulf.org/archive/2005-January/011535.html
which essentially describes mpiexec as a much better alternative compared
to mpirun for spawing application instances across worker nodes managed
by PBS. It uses PBS directly to start them, instead of rsh/ssh, thus
allowing for better monitoring and resource accounting. Does anyone have
experience with that kind of configuration?
Thanks in advance.
--
Vangelis Koukis
[log in to unmask]
OpenPGP public key ID:
pub 1024D/1D038E97 2003-07-13 Vangelis Koukis <[log in to unmask]>
Key fingerprint = C5CD E02E 2C78 7C10 8A00 53D8 FBFC 3799 1D03 8E97
|