Print

Print


Hi, Nicolas

Here is our test script (no change from RELION2 website really):

mpirun -np 9 relion_refine_mpi --i Particles/shiny_2sets.star --ref
emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf
--ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360
--K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2
--offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed
0 --o class3d --j 4 --dont_combine_weights_via_disc --gpu --pool 100


Only issue I saw is that you set preread_images and there is no --gpu
flag (I assume that you already set that). Possibly this is due to IO.

We run benchmark off local SSD scratch space directly. Do you have
local SSDs? You will need at least 2 SSDs (in RAID0) to feed the GPUs.
Hope it helps!

Best regards,

-Clara


On Fri, May 12, 2017 at 8:24 AM, Coudray, Nicolas <[log in to unmask]
> wrote:

> Hi,
>
>   To follow up with the tests of Relion on our 8GPU Titan X Pascal, below
> are the results on the benchmark data set (all done with
> "--dont_combine_weights_via_disc --no_parallel_disc_io --preread_images
> --pool 100"):
>
> *2D classification:*
> * 4GPUs, 5 MPIs, 6 threads:  9h23
> * 4GPUs, 5 MPIs, 12 threads:  9h27
> * 4GPUs, 9 MPIs, 6 threads:    7h10
> * 4GPUs, 12 MPIs, 3 threads:  6h34
>
> * 8GPUs, 5 MPIs, 12 threads:  5h36
> * 8GPUs, 9 MPIs, 6 threads:    5h17
> * 8GPUs, 17 MPIs, 3 threads:  6h26
>
> *3D classification:*
> * 4GPUs, 5 MPIs, 6 threads:    3h36
> * 4GPUs, 5 MPIs, 12 threads:  3h40
> * 4GPUs, 9 MPIs, 6 threads:    2h56
> * 4GPUs, 12 MPIs, 3 threads:  3h01
>
> * 8GPUs, 5 MPIs, 12 threads:  2h51
> * 8GPUs, 9 MPIs, 6 threads:    2h53
> * 8GPUs, 17 MPIs, 3 threads:  3h26
>
>
>
> The impact of the MPI/thread combination is quite different from what I
> expected (little gain on 8 GPUs when moving from 5MPI+12threads to
> 9MPIs+6threads for example). If you have suggestions/comments that would
> improve the performances, please let us know.
>
>
> @Dr Clara Cai: in one of your previous messages, you mentioned your 3D
> classification on that benchmark run was completed in 67 min on your 8x GPU
> machine. That's impressive and quite a difference with our results. I would
> be interested in knowing more about your setting and the specifications of
> your GPUs to figure out the differences with our machine.
>
> Thanks in advance,
> Best,
> Nicolas
>
>
>
>
>
>
>
>
> ------------------------------
> *From:* Collaborative Computational Project in Electron cryo-Microscopy [
> [log in to unmask]] on behalf of Dr. Clara Cai [
> [log in to unmask]]
> *Sent:* Friday, May 05, 2017 3:21 PM
>
> *To:* [log in to unmask]
> *Subject:* Re: [ccpem] Relion - Tests on a 8 GPU node
>
> Dear Weiwei
>
> The attached paper is from 2002. Intel Hyper-threading and Turbo Boost
> work quite efficiently as long as there is no oversubscribing. We have
> benchmarked RELION2 with and without HT, and saw very close results. We'd
> argue that there is no point in disabling HT as long as you understand
> that the system has the HT turned on and some of the cores are virtual
> cores.
>
> As to your question about what CPUs to choose, as long as you have at
> least 2 physical cores per GPU, the decision is really about your budget.
> With the CPU pricing model, you will need to pay a lot for the extra 5-10%
> performance for higher-end models, and with most processing on GPUs, you
> will see a minimal boost in overall RELION performance.
>
> Best regards,
>
> -Clara
>
> Dr. Clara Cai
> SingleParticle.com
> Turnkey GPU workstations/clusters for cryoEM
>
> On Fri, May 5, 2017 at 10:01 AM, Weiwei Wang <[log in to unmask]
> > wrote:
>
>> Hi All,
>>
>>
>> I noticed that hyper-threading is enabled in Nicolas' configuration.
>> Attached is a discussion on the use of hyper-threading in HPC (seems done
>> by Dell, from google). I wonder, in case of running Relion2 with GPUs,
>> if hyper-threading would makes any difference, maybe in optimization of
>> CPU/GPU calculation/data transfer? And a related question, how much CPU
>> power is optimal to not be a bottle neck when running with 8 fast GPUs like
>> Titan or 1080s (some double float calculations are still performed with CPU
>> right?).  Thanks a lot for any suggestions!
>>
>>
>> Best,
>>
>> Weiwei​
>>
>>
>> ------------------------------
>> *From:* Collaborative Computational Project in Electron cryo-Microscopy <
>> [log in to unmask]> on behalf of Ali Siavosh-Haghighi <
>> [log in to unmask]>
>> *Sent:* Friday, May 5, 2017 12:01 PM
>> *To:* [log in to unmask]
>>
>> *Subject:* Re: [ccpem] Relion - Tests on a 8 GPU node
>>
>> Hi All,
>> At the same time I should add that all the memory on cards fills up (11GB
>> per card; either for 8-GPU or 4-GPU assignments).
>> ===========================================
>> Ali  Siavosh-Haghighi, Ph.D.
>> HPC System Administrator
>> High Performance Computing Facility
>> Medical Center Information Technologies
>> NYU Langone Medical Center
>> Phone: (646) 501-2907
>>  http://wiki.hpc.med.nyu.edu/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__wiki.hpc.med.nyu.edu_&d=DwMFAw&c=JeTkUgVztGMmhKYjxsy2rfoWYibK1YmxXez1G3oNStg&r=6IXfFcPMNpcO_nRoU91jN2c8igJnu55HHChqtPBgkAw&m=Mh2M-4U-sNr8yayoEvMoXqmIxu2V5JI2GQ09unf4X18&s=k6QoOIAJZi3l8wD0KJQv5p2Izrswj0wIapYwB-GgygY&e=>
>> ===========================================
>>
>>
>> On May 5, 2017, at 11:36 AM, Coudray, Nicolas <[log in to unmask]>
>> wrote:
>>
>> Hi,
>>
>> Thank you all for your feedbacks!
>>
>> We will let you the know the results on the benchmark dataset asap.
>>
>>
>> Regarding the number of MPIs used, there was indeed a typo and I did use
>> "8GPUs, 9MPIs and 6 threads" in the last run of each job (except
>> auto-picking where I did use 8 MPIs).
>>
>> As for the mapping, this what I did:
>> for 2 GPUs, 3MPIs, 24 threads: -gpu "0:1"
>> for 4 GPUs, 3MPIs, 24 threads: -gpu "0,1:2,3"
>> for 4 GPUs, 5MPIs, 24 threads: -gpu "0:1:2:3"
>> for 8 GPUs:                             -gpu ""
>>
>> So for all the tests with 8 GPUs, I did let Relion figure it out, so the
>> loss of performance when going from 5 to 9 MPIs on 8 GPUs is puzzling to me.
>>
>>
>> We also noticed that for some jobs GPUs are not reaching 100%
>> utilization. Regarding this, Bharat, you mentioned that you often run 2 mpi
>> processes with at least 2 threads per GPU. Does it only increased the % GPU
>> utilization, or do you also see a consequent speed improvement?
>>
>>
>>
>>
>> BTW, regarding the CPUs we've been using, these are the specifications:
>> CPU(s):                48
>> On-line CPU(s) list:   0-47
>> Thread(s) per core:    2
>> Core(s) per socket:    12
>> Socket(s):             2
>> NUMA node(s):          2
>> Vendor ID:             GenuineIntel
>> CPU family:            6
>> Model:                 79
>> Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
>> Stepping:              1
>> CPU MHz:               2499.921
>> BogoMIPS:              4405.38
>> NUMA node0 CPU(s): 0-11,24-35
>> NUMA node1 CPU(s): 12-23,36-47
>>
>> Thanks,
>> Best
>> Nicolas
>>
>>
>>
>>
>> ________________________________________
>> From: Bjoern Forsberg [[log in to unmask]]
>> Sent: Friday, May 05, 2017 4:51 AM
>> To: Coudray, Nicolas; [log in to unmask] <[log in to unmask]>
>> Subject: Re: [ccpem] Relion - Tests on a 8 GPU node
>>
>> Hi Nicolas,
>>
>> Just to clarify, is 8 MPIs a typo? I see you ran 3 and 5 MPIs on some
>> runs with 2 and 4 GPUs, presumably to accommodate the master rank. So I
>> would expect you ran 9 MPIs on some of the 8 GPU-runs, right? One reason
>> I'm asking is that I would expect performance to increase between these
>> runs, e.g.;
>>
>> 8 GPUs, 5 MPIs, 12 threads:   4h15
>> 8 GPUs, 8 MPIs,  6 threads:   8h47
>>
>> But you see a detrimental loss of performance. If you did actually run 8
>> MPIs, that might be why. Running
>>
>> 8 GPUs, 9 MPIs,  6 threads:   ?
>>
>> should be interesting, an more relevant for performance, in that case.
>>
>> Also, did you specify --gpu without any device numbers in all cases? If
>> you did specify GPU indices, performance is fairly sensitive to how well
>> you mapped the ranks and threads to GPUs. This is why we typically
>> advise to *not* specify which GPUs to use and let relion figure it out
>> on it's own, unless you want/need to specify something in particular.
>>
>> Thanks for sharing!
>>
>> /Björn
>>
>> On 05/04/2017 10:13 PM, Coudray, Nicolas wrote:
>>
>> Hi all,
>>
>>
>> We have been running and testing Relion 2.0 on our 8 GPU nodes to try to
>> figure out the optimal parameters. We thought these results might be
>> interesting to share, and we are looking for any suggestions / comments /
>> similar tests that you could provide.
>>
>>
>>
>> Our configuration is:
>> 8 GPUs node, 48 slots, TITAN X (Pascal, 750 GB RAM, 11GB on-card),
>> sRelion compiled with gcc 4.8.5, kermel 3.10, Centos 7.3, sd hard drive of
>> 4TB
>>
>> At first, we only varied the number of GPUs, MPIs and threads, leaving
>> the other disk access options constant (No parallel disc I/O, particles
>> pre-read into the RAM, no "combine iterations"). The results for each type
>> of job are as follow:
>>
>> *** 2D Classification (265k particles of 280x280 pixels, 5 rounds, 50
>> classes):
>> 2 GPUs, 3 MPIs, 24 threads: 14h04
>> 4 GPUs, 3 MPIs, 24 threads:   5h23
>> 4 GPUs, 5 MPIs, 12 threads: 13h28
>> 8 GPUs, 3 MPIs, 24 threads:   3h14
>> 8 GPUs, 5 MPIs, 12 threads:   5h10
>> 8 GPUs, 8 MPIs,   6 threads: 13h28
>>
>>
>> *** 3D Classification (226k particles  of 280x280 pixels, 5 rounds, 5
>> classes):
>> 2 GPUs, 3 MPIs, 24 threads: 15h17
>> 4 GPUs, 3 MPIs, 24 threads:   5h53
>> 4 GPUs, 5 MPIs, 12 threads:   8h11
>> 8 GPUs, 3 MPIs, 24 threads:   2h48
>> 8 GPUs, 5 MPIs, 12 threads:   3h16
>> 8 GPUs, 8 MPIs,   6 threads:   4h37
>>
>>
>> *** 3D Refinement (116k particles of 280x280 pixels):
>> 2 GPUs, 3 MPIs, 24 threads: 12h07
>> 4 GPUs, 3 MPIs, 24 threads:   4h54
>> 4 GPUs, 5 MPIs, 12 threads:   9h12
>> 8 GPUs, 3 MPIs, 24 threads:   4h57
>> 8 GPUs, 5 MPIs, 12 threads:   4h15
>> 8 GPUs, 8 MPIs,   6 threads:   8h47
>>
>>
>> *** Auto-picking (on 2600 micrographs, generating around 750 k particles):
>> 0 GPU , 48 threads: 79 min
>> 2 GPUs,   2 threads: 52 min
>> 2 GPUs, 48 threads: error (code 11)
>> 4 GPUs,   4 threads: 27 min
>> 4 GPUs, 48 threads: 8 min
>> 8 GPUs,   8 threads: 19 min
>> 8 GPUs, 48 threads: 12 min
>>
>>
>>
>> Does anyone have particular suggestions / feedback?
>>
>> We are using these tests to guide the expansion of our centralized GPU
>> capability and any comment is greatly welcome.
>>
>>
>> Thanks,
>> Best,
>>
>> Nicolas Coudray
>> New York University
>>
>>
>>
>> ------------------------------------------------------------
>> This email message, including any attachments, is for the sole use of the
>> intended recipient(s) and may contain information that is proprietary,
>> confidential, and exempt from disclosure under applicable law. Any
>> unauthorized review, use, disclosure, or distribution is prohibited. If you
>> have received this email in error please notify the sender by return email
>> and delete the original message. Please note, the recipient should check
>> this email and any attachments for the presence of viruses. The
>> organization accepts no liability for any damage caused by any virus
>> transmitted by this email.
>> =================================
>>
>>
>>
>> ------------------------------------------------------------
>> This email message, including any attachments, is for the sole use of the
>> intended recipient(s) and may contain information that is proprietary,
>> confidential, and exempt from disclosure under applicable law. Any
>> unauthorized review, use, disclosure, or distribution is prohibited. If you
>> have received this email in error please notify the sender by return email
>> and delete the original message. Please note, the recipient should check
>> this email and any attachments for the presence of viruses. The
>> organization accepts no liability for any damage caused by any virus
>> transmitted by this email.
>> =================================
>>
>>
>>
>
> ------------------------------------------------------------
> This email message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain information that is proprietary,
> confidential, and exempt from disclosure under applicable law. Any
> unauthorized review, use, disclosure, or distribution is prohibited. If you
> have received this email in error please notify the sender by return email
> and delete the original message. Please note, the recipient should check
> this email and any attachments for the presence of viruses. The
> organization accepts no liability for any damage caused by any virus
> transmitted by this email.
> =================================
>