Hi,
In the next couple of weeks, Richard de Jong and I will be doing our
best at CERN on integrating MPI-functionality in the gLite deployment
procedure. We're both new to this list, and have some questions to
everyone with experience, knowledge and opinions on using the grid for
MPI (both in the LCG-2.x days and nowadays w/gLite-3.0). Any response is
very much appreciated and will be handled seriously; don't feel
obligated to answer each question. We work in (close) proximity with
Louis Poncet and Maarten Litmaath of the IT Grid Deployment team at
CERN.
==========================
Questions regarding MPI
==========================
1. What is the main difference between MPICH2 and OpenMPI, and which is
your (end-users') preference?
What I know so far:
- OpenMPI inherited fault-tolerance from FT-MPI, i.e. allowing loss
of a WN by calling another WN and have that part (re)executed
without loss of data (MPICH2 doesn't have this);
- MPICH2 has already been demonstrated in (heterogeneous) grid
environments, while OpenMPI still explicitly states 'heterogeneity'
on its todo-list;
- both have only preliminary support for thread-safety
(MPI_THREAD_MULTIPLE);
- (...please add your ideas here...).
2. To facilitate MPI after a new LCG-2.x or gLite-3.0 deployment, what
exact steps, if any, do you currently take on WNs, CEs, etc? (please
state which middleware you're using)
3. Would it be acceptable (to your end-users) if the grid would only
facilitate MPI with non-shared home? (i.e. due to scalability and
reliability issues)
4. Would it be acceptable (to your end-users) if the grid would only
facilitate one or two specific implementations of MPI? And what's the state
of MPI-1.1 vs. MPI-2?
5. What's your opinion on having a user compile his MPI program at the
UI, versus having the WNs compile it?
6. Should we (and how?) anticipate MS-MPI on Microsoft's pending HPC
server products?
===========================
Questions regarding YAIM
===========================
7. How would you like YAIM to behave for facilitating MPI? (e.g. MPI
opt-in vs. opt-out at installation, how to (not) change the
OS-configuration of the WNs, think "HostbasedAuthentication")
8. Do you have suggestions for improvements to YAIM?
That's it. Other comments and pointers are welcome, ofcourse!
Friendly regards,
Matthijs Koot & Richard de Jong
|