Hi,
> 64 vCPUs, 8 GPUs.
I recommend using 1 MPI process per GPU. Thus, I would
use 25 MPI x 8 threads with 3 nodes. Note that the master
process (rank 0) does no real calculation in classification
and refinement; so I often over-subscribe by one.
That being said, Class2D/3D/Refine3D do not scale very
well with so many GPUs. For most tasks, using 4 GPU per job is the best.
In contrast, MotionCorr, CtfRefine and Bayesian Polishing scale very well
with number of cores, as long as IO is not limiting.
For these tasks, I recommend more MPI than threads.
With 64 cores, I recommend 32 MPI x 2 threads or 16 MPI x 4 threads
(but you need enough memory).
Did you look at our benchmark page?
https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Benchmarks_%26_computer_hardware
Although this page is bit out-dated (mixture of results from 2.1 and 3.0),
it will give you rough idea of what to expect and what parameters to use.
Best regards,
Takanori Nakane
> Hi all,
>
> I am a cloud developer requested to build an elastic HPC cluster for
> Relion. I am now testing the performance of Relion on this newly built
> cluster. My test includes running Class3D Classification on ~2.1T of data.
> I ran the test with three different configurations-
>
> 1. 192MPI, 1 Thread ( 3 nodes)
> 2. 384MPI, 1 Thread ( 6 nodes)
> 3. 576MPI, 1 Thread ( 9 nodes)
>
> To the contrary, the test with 9 nodes was the slowest taking ~12 hours
> while test with 3 nodes took 3.1 hours to complete. Test with 6 nodes took
> ~4hours to complete. Clearly, I am missing some key aspect with the
> parameters to submit a job. can someone please help me in achieving best
> performance . How do we know the appropriate configuration to submit the
> jobs? Can someone explain me how Relion maps the processes to the
> hardware.
>
> Hardware description of nodes- 64 vCPUs, 8 GPUs.
> Scheduler Used: Sun Grid Engine (SGE)
> Template used to submit the job to the cluster:
>
> #!/bin/tcsh
> #$ -pe queue_name #mpi * #threads
> #$ -cwd
> #$ -S /bin/tcsh
> #$ -e XXXerrfileXXX
> #$ -o XXXoutfileXXX
> #$ -V
>
> mpiexec -n #mpi XXXcommandXXX
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|