Print

Print


Marco Verlato wrote:

> Kostas Georgakopoulos wrote:
>
>> We installed the middleware with yaim and specificaly the 
>> lcg-CE-torque and lcg-WN-torque packages on the CE and WN's 
>> respectively. However
>> on the site configuration file we have :
>>
>> JOB_MANAGER=lcgpbs
>> CE_BATCH_SYS=torque
>>
>> the equivalent of what you say should be to change the JOB_MANAGER 
>> and CE_BATCH_SYS to pbs and reconfigure CE and WN's right?
>
>
> No, is to change only CE_BATCH_SYS to pbs and reconfigure the CE only.
> Doing this way, the INFN solution described at 
> http://grid-it.cnaf.infn.it/index.php?mpihowto&type=1 works also with 
> non-shared home directories,

And the normal jobs  (non mpi jobs) will still work? because Charles 
Loomis made that point exactly: if you change the configuration to pbs 
and you *don't*
have shared home directories then all jobs will fail.

> once in all WNs and CE:
>
> 1. The file /etc/ssh/sshd_config must contain at least the following 
> lines:
>
> HostbasedAuthentication yes
> IgnoreUserKnownHosts yes
> IgnoreRhosts yes
>
> 2. The file /etc/ssh/ssh_config must contain in the section Host* the 
> foollowing line:
>
> HostbasedAuthentication yes
>
> 3. The file /etc/ssh_known_hosts2 must contain the public key of all 
> the WN and the
>     CE in the site and must be replicated on every computer.
>
> 4. The file /etc/ssh/shosts.equiv must contain the list of the 
> hostanames of WNs and CE
>
> 5. ssh daemon has to be restarted:     /sbin/service sshd restart
>
> and the script will copy all job subdirectory from the WN where the 
> job is executed to all the others in the set choosen for the job.
>
> best regards,
> Marco
>
>>
>> best regards,
>> Kostas Georgakopoulos - University of Macedonia
>>
>> Marco Verlato wrote:
>>
>>> Hi Kostas,
>>> if can help, in the Italian Grid we found that MPI didn't work for 
>>> torque if the CE GRIS published GlueCEInfoLRMSType=torque as is in 
>>> your case for the alexander.it.uom.gr CE. After putting 
>>> GlueCEInfoLRMSType=pbs our MPI implementation 
>>> (http://grid-it.cnaf.infn.it/index.php?mpihowto&type=1)  worked.
>>>
>>> best regards,
>>> Marco
>>>
>>> Kostas Georgakopoulos wrote:
>>>
>>>>  Hi all,
>>>>
>>>>  i configured our site (GR-02-UoM) for mpi support following the 
>>>> instructions in 
>>>> http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque
>>>> (torque is the job manager for us) and it seems that everything is 
>>>> ok. However i tried executing the test job from 
>>>> http://quattor.web.lal.in2p3.fr/packages/mpi
>>>> and the job get stuck in one of the workers till the proxy 
>>>> certificate expires. The command used to submit the job was:
>>>>
>>>> edg-job-submit --vo dteam --lrms pbs -r 
>>>> alexander.it.uom.gr:2119/jobmanager-lcgpbs-dteam MPItest.jdl
>>>>
>>>> has anyone have any idea what the problem might be? (i include the 
>>>> files below).
>>>>
>>>> Best regards
>>>> Kostas Georgakopoulos
>>>> University of Macedonia
>>>>
>>>> MPItest.jdl:
>>>>
>>>> Type = "Job";
>>>> JobType = "MPICH";
>>>> NodeNumber = 8;
>>>> Executable = "MPItest.sh";
>>>> Arguments = "MPItest";
>>>> StdOutput = "test.out";
>>>> StdError = "test.err";
>>>> InputSandbox = {"MPItest.sh","MPItest.c"};
>>>> OutputSandbox = {"test.err","test.out","mpiexec.out"};
>>>>
>>>> MPItest.sh:
>>>>
>>>> #!/bin/sh -x
>>>>
>>>> # the binary to execute
>>>> EXE=$1
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> echo "Running on: $HOSTNAME"
>>>> echo "As:       " `whoami`
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> echo "Compiling binary: $EXE"
>>>> echo mpicc -o ${EXE} ${EXE}.c
>>>> mpicc -o ${EXE} ${EXE}.c
>>>> echo "*************************************"
>>>>
>>>> if [ "x$PBS_NODEFILE" != "x" ] ; then
>>>> echo "PBS Nodefile: $PBS_NODEFILE"
>>>> HOST_NODEFILE=$PBS_NODEFILE
>>>> fi
>>>>
>>>> if [ "x$LSB_HOSTS" != "x" ] ; then
>>>> echo "LSF Hosts: $LSB_HOSTS"
>>>> HOST_NODEFILE=`pwd`/lsf_nodefile.$$
>>>> for host in ${LSB_HOSTS}
>>>> do
>>>>   echo $host >> ${HOST_NODEFILE}
>>>> done
>>>> fi
>>>>
>>>> if [ "x$HOST_NODEFILE" = "x" ]; then
>>>> echo "No hosts file defined.  Exiting..."
>>>> exit
>>>> fi
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> CPU_NEEDED=`cat $HOST_NODEFILE | wc -l`
>>>> echo "Node count: $CPU_NEEDED"
>>>> echo "Nodes in $HOST_NODEFILE: "
>>>> cat $HOST_NODEFILE
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> CPU_NEEDED=`cat $HOST_NODEFILE | wc -l`
>>>> echo "Checking ssh for each node:"
>>>> NODES=`cat $HOST_NODEFILE`
>>>> for host in ${NODES}
>>>> do
>>>> echo "Checking $host..."
>>>> ssh $host hostname
>>>> done
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> echo "Executing $EXE with mpiexec"
>>>> chmod 755 $EXE
>>>> mpiexec `pwd`/$EXE > mpiexec.out 2>&1
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>>
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>> echo "Executing $EXE with mpirun"
>>>> chmod 755 $EXE
>>>> mpirun -np $CPU_NEEDED -machinefile $HOST_NODEFILE `pwd`/$EXE
>>>> echo 
>>>> "***********************************************************************" 
>>>>
>>>>
>>>> MPItest.c:
>>>>
>>>> /*  hello.c
>>>> *
>>>> *  Simple "Hello World" program in MPI.
>>>> *
>>>> */
>>>>
>>>> #include "mpi.h"
>>>> #include <stdio.h>
>>>> int main(int argc, char *argv[])
>>>> {
>>>> int numprocs;  /* Number of processors */
>>>> int procnum;   /* Processor number */
>>>> /* Initialize MPI */
>>>> MPI_Init(&argc, &argv);
>>>> /* Find this processor number */
>>>> MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
>>>> /* Find the number of processors */
>>>> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>>>> printf ("Hello world! from processor %d out of %d\n", procnum, 
>>>> numprocs);
>>>> /* Shut down MPI */
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>
>