Hello Cal,
mpiexec is trying to use the myrinet network, which you probably don't have at your site. You can either call it with the option "-comm p4", or recompile it with the configuration option "--with-default-comm=mpich-p4".
Also take care that you disable shared memory support (like you seem to have done), because otherwise it won't work either (at least not on SMP nodes). Enabling shared memory support is not very useful within LCG because the default mpich comes without shared memory support and as a static library. Therefore most mpi programs running on your site will not perform shared memory communication.
Regards,
Fokke Dijkstra
LHC Computer Grid - Rollout wrote:
> Hello,
>
> I'm trying to get mpiexec running on my site (LAL). I've
> managed to get it compiled and running, but the simpliest
> hello world job returns incorrect output. The process just
> prints the process' rank and the total number of processes.
> For mpirun I get:
>
> Hello world! from processor 4 out of 5
> Hello world! from processor 2 out of 5
> Hello world! from processor 3 out of 5
> Hello world! from processor 1 out of 5
> Hello world! from processor 0 out of 5
>
> Where as for mpiexec I get:
>
> Hello world! from processor 0 out of 1
> Hello world! from processor 0 out of 1
> Hello world! from processor 0 out of 1
> Hello world! from processor 0 out of 1
> Hello world! from processor 0 out of 1
>
> The current number, but not the correct information. If I
> use mpiexec with the verbose flag it does seem to be
> connecting to and starting the processes on the correct
> machines. Any help with this would be appreciated.
>
> Cal
>
>
> P.S.
>
> Both MPICH and mpiexec were compiled without shared memory
> communications on SMP nodes.
>
> The code for the job is:
>
> /* hello.c
> *
> * Simple "Hello World" program in MPI.
> *
> */
>
> #include "mpi.h"
> #include <stdio.h>
> int main(int argc, char *argv[])
> {
> int numprocs; /* Number of processors */
> int procnum; /* Processor number */
> /* Initialize MPI */
> MPI_Init(&argc, &argv);
> /* Find this processor number */
> MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
> /* Find the number of processors */
> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
> printf ("Hello world! from processor %d out of %d\n",
> procnum, numprocs);
> /* Shut down MPI */
> MPI_Finalize();
> return 0;
> }
>
>
> The mpiexec verbose output is:
>
> resolve_exe: prefixing dot to executable: "./MPItest"
> node 0: name = grid18.lal.in2p3.fr, mpname =
> grid18.lal.in2p3.fr, cpu = 1 node 1: name =
> grid18.lal.in2p3.fr, mpname = grid18.lal.in2p3.fr, cpu = 0
> node 2: name = grid17.lal.in2p3.fr, mpname =
> grid17.lal.in2p3.fr, cpu = 1 node 3: name =
> grid17.lal.in2p3.fr, mpname = grid17.lal.in2p3.fr, cpu = 0
> node 4: name = grid16.lal.in2p3.fr, mpname =
> grid16.lal.in2p3.fr, cpu = 0 Hello world! from processor 0
> out of 1 Hello world! from processor 0 out of 1 Hello world!
> from processor 0 out of 1 Hello world! from processor 0 out
> of 1 Hello world! from processor 0 out of 1
> wait_one_task_start: evt = 2, task 0 host grid18.lal.in2p3.fr
> wait_one_task_start: evt = 3, task 1 host grid18.lal.in2p3.fr
> wait_one_task_start: evt = 4, task 2 host grid17.lal.in2p3.fr
> wait_one_task_start: evt = 5, task 3 host grid17.lal.in2p3.fr
> wait_one_task_start: evt = 6, task 4 host grid16.lal.in2p3.fr
> All 5 tasks started.
> read_gm_startup_ports: waiting for info
> wait_tasks: waiting for grid18.lal.in2p3.fr/1 grid18.lal.in2p3.fr/0
> grid17.lal.in2p3.fr/1 grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0
> wait_tasks: numspawned = 5, got evt 7 for tid 2 host
> grid18.lal.in2p3.fr status 0 wait_tasks: waiting for
> grid18.lal.in2p3.fr/0 grid17.lal.in2p3.fr/1 grid17.lal.in2p3.fr/0
> grid16.lal.in2p3.fr/0 wait_tasks: numspawned = 4, got evt 8 for tid 3
> host grid18.lal.in2p3.fr status 0 wait_tasks: waiting for
> grid17.lal.in2p3.fr/1 grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0
> wait_tasks: numspawned = 3, got evt 9 for tid 4 host
> grid17.lal.in2p3.fr status 0 wait_tasks: waiting for
> grid17.lal.in2p3.fr/0 grid16.lal.in2p3.fr/0 wait_tasks: numspawned =
> 2, got evt 10 for tid 5 host grid17.lal.in2p3.fr status 0 wait_tasks:
> waiting for grid16.lal.in2p3.fr/0
> wait_tasks: numspawned = 1, got evt 11 for tid 6 host
> grid16.lal.in2p3.fr status 0
--------
Fokke Dijkstra
High Performance Computing
SARA - Reken- en Netwerkdiensten http://www.sara.nl
Tel. +31 20 592 8004 Fax. +31 20 668 3167
|