Print

Print


Dear Ambroise,

Thank you for your feedback to the list!

I completely agree with you that this may be a tricky thing to achieve 
on some clusters. However, it has nothing to do with relion code in 
itself: it would be exactly the same for any MPI programme (e.g. a 
hello-world one you would write yourself). It is a consequence of how 
the MPI standard is implemented, and how it communicates with your job 
allocation system (i.e. queue). It is not an easy problem, as the 
hardware for parallel computing is extremely diverse.

Best,
Sjors

On 06/13/2014 10:20 AM, Ambroise Desfosses wrote:
> Dear Sjors, dear all,
>
> thanks for your anwers. It appeared that the problem mainly came from the way our cluster system (LSF) was setting up the environment for MPI (which I guess relion somehow parses as well ?). We managed to finally reserve the right amount of cpus and providing the correct list of host to mpirun by playing with internal LSF environment variables related to MPI usage.
> What may be useful for a general use of Relion on various cluster systems would be to know how exactly Relion reads the MPI environment and make use of it to distribute the master and the slaves jobs. Also, it appeared relatively difficult to set up the parralelization in a way that several mpi jobs end up on the same host, which can be useful for hosts with many cpus. We have e.g. 80 cpus machines, and there it seems that using 80 threads is way too much for most of our Relion's usage, therefore it could be nice to send e.g. 4 mpijobs with 20 threads each on those machines. We found kind of a solution for this, but very specific to our cluster and not ideal at all.
> Again, thanks for your help, and for users working on a LSF cluster, do not hesitate to contact me if questions/other suggestions.
>
> Cheers,
> Ambroise Desfosses

-- 
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres