Hello,
I'm trying to get mpiexec running on my site (LAL). I've managed to get
it compiled and running, but the simpliest hello world job returns
incorrect output. The process just prints the process' rank and the
total number of processes. For mpirun I get:
Hello world! from processor 4 out of 5
Hello world! from processor 2 out of 5
Hello world! from processor 3 out of 5
Hello world! from processor 1 out of 5
Hello world! from processor 0 out of 5
Where as for mpiexec I get:
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
The current number, but not the correct information. If I use mpiexec
with the verbose flag it does seem to be connecting to and starting the
processes on the correct machines. Any help with this would be
appreciated.
Cal
P.S.
Both MPICH and mpiexec were compiled without shared memory
communications on SMP nodes.
The code for the job is:
/* hello.c
*
* Simple "Hello World" program in MPI.
*
*/
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int numprocs; /* Number of processors */
int procnum; /* Processor number */
/* Initialize MPI */
MPI_Init(&argc, &argv);
/* Find this processor number */
MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
/* Find the number of processors */
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);
/* Shut down MPI */
MPI_Finalize();
return 0;
}
The mpiexec verbose output is:
resolve_exe: prefixing dot to executable: "./MPItest"
node 0: name = grid18.lal.in2p3.fr, mpname = grid18.lal.in2p3.fr, cpu = 1
node 1: name = grid18.lal.in2p3.fr, mpname = grid18.lal.in2p3.fr, cpu = 0
node 2: name = grid17.lal.in2p3.fr, mpname = grid17.lal.in2p3.fr, cpu = 1
node 3: name = grid17.lal.in2p3.fr, mpname = grid17.lal.in2p3.fr, cpu = 0
node 4: name = grid16.lal.in2p3.fr, mpname = grid16.lal.in2p3.fr, cpu = 0
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
Hello world! from processor 0 out of 1
wait_one_task_start: evt = 2, task 0 host grid18.lal.in2p3.fr
wait_one_task_start: evt = 3, task 1 host grid18.lal.in2p3.fr
wait_one_task_start: evt = 4, task 2 host grid17.lal.in2p3.fr
wait_one_task_start: evt = 5, task 3 host grid17.lal.in2p3.fr
wait_one_task_start: evt = 6, task 4 host grid16.lal.in2p3.fr
All 5 tasks started.
read_gm_startup_ports: waiting for info
wait_tasks: waiting for grid18.lal.in2p3.fr/1 grid18.lal.in2p3.fr/0
grid17.lal.in2p3.fr/1 grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0
wait_tasks: numspawned = 5, got evt 7 for tid 2 host grid18.lal.in2p3.fr
status 0
wait_tasks: waiting for grid18.lal.in2p3.fr/0 grid17.lal.in2p3.fr/1
grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0
wait_tasks: numspawned = 4, got evt 8 for tid 3 host grid18.lal.in2p3.fr
status 0
wait_tasks: waiting for grid17.lal.in2p3.fr/1 grid17.lal.in2p3.fr/0
grid16.lal.in2p3.fr/0
wait_tasks: numspawned = 3, got evt 9 for tid 4 host grid17.lal.in2p3.fr
status 0
wait_tasks: waiting for grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0
wait_tasks: numspawned = 2, got evt 10 for tid 5 host
grid17.lal.in2p3.fr status 0
wait_tasks: waiting for grid16.lal.in2p3.fr/0
wait_tasks: numspawned = 1, got evt 11 for tid 6 host
grid16.lal.in2p3.fr status 0
|