Marco Verlato wrote: > Kostas Georgakopoulos wrote: > >> We installed the middleware with yaim and specificaly the >> lcg-CE-torque and lcg-WN-torque packages on the CE and WN's >> respectively. However >> on the site configuration file we have : >> >> JOB_MANAGER=lcgpbs >> CE_BATCH_SYS=torque >> >> the equivalent of what you say should be to change the JOB_MANAGER >> and CE_BATCH_SYS to pbs and reconfigure CE and WN's right? > > > No, is to change only CE_BATCH_SYS to pbs and reconfigure the CE only. > Doing this way, the INFN solution described at > http://grid-it.cnaf.infn.it/index.php?mpihowto&type=1 works also with > non-shared home directories, And the normal jobs (non mpi jobs) will still work? because Charles Loomis made that point exactly: if you change the configuration to pbs and you *don't* have shared home directories then all jobs will fail. > once in all WNs and CE: > > 1. The file /etc/ssh/sshd_config must contain at least the following > lines: > > HostbasedAuthentication yes > IgnoreUserKnownHosts yes > IgnoreRhosts yes > > 2. The file /etc/ssh/ssh_config must contain in the section Host* the > foollowing line: > > HostbasedAuthentication yes > > 3. The file /etc/ssh_known_hosts2 must contain the public key of all > the WN and the > CE in the site and must be replicated on every computer. > > 4. The file /etc/ssh/shosts.equiv must contain the list of the > hostanames of WNs and CE > > 5. ssh daemon has to be restarted: /sbin/service sshd restart > > and the script will copy all job subdirectory from the WN where the > job is executed to all the others in the set choosen for the job. > > best regards, > Marco > >> >> best regards, >> Kostas Georgakopoulos - University of Macedonia >> >> Marco Verlato wrote: >> >>> Hi Kostas, >>> if can help, in the Italian Grid we found that MPI didn't work for >>> torque if the CE GRIS published GlueCEInfoLRMSType=torque as is in >>> your case for the alexander.it.uom.gr CE. After putting >>> GlueCEInfoLRMSType=pbs our MPI implementation >>> (http://grid-it.cnaf.infn.it/index.php?mpihowto&type=1) worked. >>> >>> best regards, >>> Marco >>> >>> Kostas Georgakopoulos wrote: >>> >>>> Hi all, >>>> >>>> i configured our site (GR-02-UoM) for mpi support following the >>>> instructions in >>>> http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque >>>> (torque is the job manager for us) and it seems that everything is >>>> ok. However i tried executing the test job from >>>> http://quattor.web.lal.in2p3.fr/packages/mpi >>>> and the job get stuck in one of the workers till the proxy >>>> certificate expires. The command used to submit the job was: >>>> >>>> edg-job-submit --vo dteam --lrms pbs -r >>>> alexander.it.uom.gr:2119/jobmanager-lcgpbs-dteam MPItest.jdl >>>> >>>> has anyone have any idea what the problem might be? (i include the >>>> files below). >>>> >>>> Best regards >>>> Kostas Georgakopoulos >>>> University of Macedonia >>>> >>>> MPItest.jdl: >>>> >>>> Type = "Job"; >>>> JobType = "MPICH"; >>>> NodeNumber = 8; >>>> Executable = "MPItest.sh"; >>>> Arguments = "MPItest"; >>>> StdOutput = "test.out"; >>>> StdError = "test.err"; >>>> InputSandbox = {"MPItest.sh","MPItest.c"}; >>>> OutputSandbox = {"test.err","test.out","mpiexec.out"}; >>>> >>>> MPItest.sh: >>>> >>>> #!/bin/sh -x >>>> >>>> # the binary to execute >>>> EXE=$1 >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> echo "Running on: $HOSTNAME" >>>> echo "As: " `whoami` >>>> echo >>>> "***********************************************************************" >>>> >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> echo "Compiling binary: $EXE" >>>> echo mpicc -o ${EXE} ${EXE}.c >>>> mpicc -o ${EXE} ${EXE}.c >>>> echo "*************************************" >>>> >>>> if [ "x$PBS_NODEFILE" != "x" ] ; then >>>> echo "PBS Nodefile: $PBS_NODEFILE" >>>> HOST_NODEFILE=$PBS_NODEFILE >>>> fi >>>> >>>> if [ "x$LSB_HOSTS" != "x" ] ; then >>>> echo "LSF Hosts: $LSB_HOSTS" >>>> HOST_NODEFILE=`pwd`/lsf_nodefile.$$ >>>> for host in ${LSB_HOSTS} >>>> do >>>> echo $host >> ${HOST_NODEFILE} >>>> done >>>> fi >>>> >>>> if [ "x$HOST_NODEFILE" = "x" ]; then >>>> echo "No hosts file defined. Exiting..." >>>> exit >>>> fi >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> CPU_NEEDED=`cat $HOST_NODEFILE | wc -l` >>>> echo "Node count: $CPU_NEEDED" >>>> echo "Nodes in $HOST_NODEFILE: " >>>> cat $HOST_NODEFILE >>>> echo >>>> "***********************************************************************" >>>> >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> CPU_NEEDED=`cat $HOST_NODEFILE | wc -l` >>>> echo "Checking ssh for each node:" >>>> NODES=`cat $HOST_NODEFILE` >>>> for host in ${NODES} >>>> do >>>> echo "Checking $host..." >>>> ssh $host hostname >>>> done >>>> echo >>>> "***********************************************************************" >>>> >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> echo "Executing $EXE with mpiexec" >>>> chmod 755 $EXE >>>> mpiexec `pwd`/$EXE > mpiexec.out 2>&1 >>>> echo >>>> "***********************************************************************" >>>> >>>> >>>> echo >>>> "***********************************************************************" >>>> >>>> echo "Executing $EXE with mpirun" >>>> chmod 755 $EXE >>>> mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE >>>> echo >>>> "***********************************************************************" >>>> >>>> >>>> MPItest.c: >>>> >>>> /* hello.c >>>> * >>>> * Simple "Hello World" program in MPI. >>>> * >>>> */ >>>> >>>> #include "mpi.h" >>>> #include <stdio.h> >>>> int main(int argc, char *argv[]) >>>> { >>>> int numprocs; /* Number of processors */ >>>> int procnum; /* Processor number */ >>>> /* Initialize MPI */ >>>> MPI_Init(&argc, &argv); >>>> /* Find this processor number */ >>>> MPI_Comm_rank(MPI_COMM_WORLD, &procnum); >>>> /* Find the number of processors */ >>>> MPI_Comm_size(MPI_COMM_WORLD, &numprocs); >>>> printf ("Hello world! from processor %d out of %d\n", procnum, >>>> numprocs); >>>> /* Shut down MPI */ >>>> MPI_Finalize(); >>>> return 0; >>>> } >>> >>> >>> >>> >>> >>> > >