Hi Nicolas,
Just to clarify, is 8 MPIs a typo? I see you ran 3 and 5 MPIs on some
runs with 2 and 4 GPUs, presumably to accommodate the master rank. So I
would expect you ran 9 MPIs on some of the 8 GPU-runs, right? One reason
I'm asking is that I would expect performance to increase between these
runs, e.g.;
8 GPUs, 5 MPIs, 12 threads: 4h15
8 GPUs, 8 MPIs, 6 threads: 8h47
But you see a detrimental loss of performance. If you did actually run 8
MPIs, that might be why. Running
8 GPUs, 9 MPIs, 6 threads: ?
should be interesting, an more relevant for performance, in that case.
Also, did you specify --gpu without any device numbers in all cases? If
you did specify GPU indices, performance is fairly sensitive to how well
you mapped the ranks and threads to GPUs. This is why we typically
advise to *not* specify which GPUs to use and let relion figure it out
on it's own, unless you want/need to specify something in particular.
Thanks for sharing!
/Björn
On 05/04/2017 10:13 PM, Coudray, Nicolas wrote:
> Hi all,
>
>
> We have been running and testing Relion 2.0 on our 8 GPU nodes to try to figure out the optimal parameters. We thought these results might be interesting to share, and we are looking for any suggestions / comments / similar tests that you could provide.
>
>
>
> Our configuration is:
> 8 GPUs node, 48 slots, TITAN X (Pascal, 750 GB RAM, 11GB on-card), sRelion compiled with gcc 4.8.5, kermel 3.10, Centos 7.3, sd hard drive of 4TB
>
> At first, we only varied the number of GPUs, MPIs and threads, leaving the other disk access options constant (No parallel disc I/O, particles pre-read into the RAM, no "combine iterations"). The results for each type of job are as follow:
>
> *** 2D Classification (265k particles of 280x280 pixels, 5 rounds, 50 classes):
> 2 GPUs, 3 MPIs, 24 threads: 14h04
> 4 GPUs, 3 MPIs, 24 threads: 5h23
> 4 GPUs, 5 MPIs, 12 threads: 13h28
> 8 GPUs, 3 MPIs, 24 threads: 3h14
> 8 GPUs, 5 MPIs, 12 threads: 5h10
> 8 GPUs, 8 MPIs, 6 threads: 13h28
>
>
> *** 3D Classification (226k particles of 280x280 pixels, 5 rounds, 5 classes):
> 2 GPUs, 3 MPIs, 24 threads: 15h17
> 4 GPUs, 3 MPIs, 24 threads: 5h53
> 4 GPUs, 5 MPIs, 12 threads: 8h11
> 8 GPUs, 3 MPIs, 24 threads: 2h48
> 8 GPUs, 5 MPIs, 12 threads: 3h16
> 8 GPUs, 8 MPIs, 6 threads: 4h37
>
>
> *** 3D Refinement (116k particles of 280x280 pixels):
> 2 GPUs, 3 MPIs, 24 threads: 12h07
> 4 GPUs, 3 MPIs, 24 threads: 4h54
> 4 GPUs, 5 MPIs, 12 threads: 9h12
> 8 GPUs, 3 MPIs, 24 threads: 4h57
> 8 GPUs, 5 MPIs, 12 threads: 4h15
> 8 GPUs, 8 MPIs, 6 threads: 8h47
>
>
> *** Auto-picking (on 2600 micrographs, generating around 750 k particles):
> 0 GPU , 48 threads: 79 min
> 2 GPUs, 2 threads: 52 min
> 2 GPUs, 48 threads: error (code 11)
> 4 GPUs, 4 threads: 27 min
> 4 GPUs, 48 threads: 8 min
> 8 GPUs, 8 threads: 19 min
> 8 GPUs, 48 threads: 12 min
>
>
>
> Does anyone have particular suggestions / feedback?
>
> We are using these tests to guide the expansion of our centralized GPU capability and any comment is greatly welcome.
>
>
> Thanks,
> Best,
>
> Nicolas Coudray
> New York University
>
>
>
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
> =================================
|