JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for CCPEM Archives


CCPEM Archives

CCPEM Archives


CCPEM@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CCPEM Home

CCPEM Home

CCPEM  May 2017

CCPEM May 2017

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Relion - Tests on a 8 GPU node

From:

Bharat Reddy <[log in to unmask]>

Reply-To:

Bharat Reddy <[log in to unmask]>

Date:

Fri, 12 May 2017 18:34:08 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (542 lines)

Hi Clara,

Are you guys running one mpi process per gpu in your example below? If
so, my point is if you have enough gpu memory to spare, you should try
running more than one mpi process per gpu to get the most out of your
hardware.  Do you guys have access to 16/18/20/22/24 core cpus to test
your 8 GPU nodes on? I am curious how much faster it would run if you
ran two (or more depending on the cpu) mpi processes per gpu. 

That being said, your setup is already running the standard relion
benchmark pretty quick. However we are collecting millions of particles
in some of our datasets and while some small potential performance
gains might not mean much in benchmarks, it can equal many hours of
computing time with our non-ideal data sets. 

Cheers,
BR

On Fri, 2017-05-12 at 15:44 -0700, Dr. Clara Cai wrote:
> Dear Bharat
> 
> In our testing with 8GPU machine, dual 12-core CPUs are actually okay
> for balancing the load for 8xGTX1080Ti (below is our iteration
> results for 3d classification). Your concern with physical core count
> is definitely right, but we believe that the workload still works due
> to the fact that some threads are doing IO or waiting for GPU results
> and this is a case where Intel Hyperthreading does work alright. 
> 
> The threading parameter j could be reduced to 3 for a better match
> with the physical core count of 24, but we did not see any
> improvement over using j=4. Of course, setting j too high would not
> be recommended as it will oversubscribe the CPUs with penalty. 
> 
> Best regards, 
> 
> -Clara
> 
> 
> 8x GTX1080Ti GPU server
> Apr 20 22:36 timer_start
> Apr 20 22:39 class3d_it000_model.star
> Apr 20 22:42 class3d_it001_model.star
> Apr 20 22:44 class3d_it002_model.star
> Apr 20 22:46 class3d_it003_model.star
> Apr 20 22:48 class3d_it004_model.star
> Apr 20 22:50 class3d_it005_model.star
> Apr 20 22:52 class3d_it006_model.star
> Apr 20 22:55 class3d_it007_model.star
> Apr 20 22:57 class3d_it008_model.star
> Apr 20 22:59 class3d_it009_model.star
> Apr 20 23:02 class3d_it010_model.star
> Apr 20 23:04 class3d_it011_model.star
> Apr 20 23:07 class3d_it012_model.star
> Apr 20 23:09 class3d_it013_model.star
> Apr 20 23:12 class3d_it014_model.star
> Apr 20 23:15 class3d_it015_model.star
> Apr 20 23:17 class3d_it016_model.star
> Apr 20 23:20 class3d_it017_model.star
> Apr 20 23:23 class3d_it018_model.star
> Apr 20 23:26 class3d_it019_model.star
> Apr 20 23:29 class3d_it020_model.star
> Apr 20 23:31 class3d_it021_model.star
> Apr 20 23:34 class3d_it022_model.star
> Apr 20 23:37 class3d_it023_model.star
> Apr 20 23:40 class3d_it024_model.star
> Apr 20 23:43 class3d_it025_model.star
> 
> On Fri, May 12, 2017 at 3:09 PM, Bharat Reddy <000009a7465b91d2-dmarc
> [log in to unmask]> wrote:
> > Hi Nicolas,
> > 
> > Your results are rather slow if you are using the official relion
> > benchmark dataset. Please confirm you are using the official relion
> > benchmark dataset. Also as Clara mentioned, please confirm you are
> > using the --gpu option in your command.
> > 
> > You said in your previous email you are using two E5-2650 v4 cpus.
> > This
> > means you only have 24 real cores. Often despite the number of
> > threads
> > you request, relion only uses the power of two threads per mpi
> > process
> > (runing the program `top` rarely showing more than 200% cpu usage
> > per
> > relion mpi process). Using this observation of needing only 2 cores
> > per
> > mpi process, this is likely why you see a slow down on your 8GPU
> > jobs
> > between 9 and 17 MPI processes. You are requesting more threads
> > than
> > you have cores with the 17 MPI jobs. There is a performance penalty
> > of
> > switching a process from core to core despite having a total
> > capability
> > of 48 threads due to hyperthreading. Hyperthreading is like
> > sprinkles
> > on a cake. They do not make your cake taste much better, but do
> > make it
> > look nicer. So while running two (or more) mpi process per GPU will
> > increase GPU utilization, if you don't have the cpu cores to run
> > the
> > process, you will often get a net overall decrease in performance.
> > 
> > My recommendation would be to upgrade your cpus to 16 cores on your
> > 8
> > gpu system so you have atleast 4 cores per GPU. You see the benefit
> > of
> > this in your 5 vs 9 mpi process when using only 4 gpus. In your 9
> > MPI
> > process you are likely using 4 cores/gpu and are likely coming
> > close to
> > taxing your gpus at 100% (especially in 3D classification). That
> > being
> > said, this 16 core cpus are a significant cost increase. This is
> > why we
> > settled on 4 gpu nodes as 8 core cpus can be found at 1/4th the
> > cost.
> > 
> > Cheers,
> > BR
> > 
> > On Fri, 2017-05-12 at 15:24 +0000, Coudray, Nicolas wrote:
> > > Hi,
> > >
> > >   To follow up with the tests of Relion on our 8GPU Titan X
> > Pascal,
> > > below are the results on the benchmark data set (all done with "
> > > --dont_combine_weights_via_disc --no_parallel_disc_io --
> > > preread_images  --pool 100"): 
> > >
> > > 2D classification: 
> > > * 4GPUs, 5 MPIs, 6 threads:  9h23
> > > * 4GPUs, 5 MPIs, 12 threads:  9h27
> > > * 4GPUs, 9 MPIs, 6 threads:    7h10
> > > * 4GPUs, 12 MPIs, 3 threads:  6h34
> > >
> > > * 8GPUs, 5 MPIs, 12 threads:  5h36
> > > * 8GPUs, 9 MPIs, 6 threads:    5h17
> > > * 8GPUs, 17 MPIs, 3 threads:  6h26
> > >
> > > 3D classification:
> > > * 4GPUs, 5 MPIs, 6 threads:    3h36
> > > * 4GPUs, 5 MPIs, 12 threads:  3h40
> > > * 4GPUs, 9 MPIs, 6 threads:    2h56
> > > * 4GPUs, 12 MPIs, 3 threads:  3h01
> > >
> > > * 8GPUs, 5 MPIs, 12 threads:  2h51
> > > * 8GPUs, 9 MPIs, 6 threads:    2h53
> > > * 8GPUs, 17 MPIs, 3 threads:  3h26
> > >
> > >
> > >
> > > The impact of the MPI/thread combination is quite different from
> > what
> > > I expected (little gain on 8 GPUs when moving from 5MPI+12threads
> > to
> > > 9MPIs+6threads for example). If you have suggestions/comments
> > that
> > > would improve the performances, please let us know.
> > >
> > >
> > > @Dr Clara Cai: in one of your previous messages, you mentioned
> > your
> > > 3D classification on that benchmark run was completed in 67 min
> > on
> > > your 8x GPU machine. That's impressive and quite a difference
> > with
> > > our results. I would be interested in knowing more about your
> > setting
> > > and the specifications of your GPUs to figure out the differences
> > > with our machine.
> > >
> > > Thanks in advance,
> > > Best,
> > > Nicolas
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > From: Collaborative Computational Project in Electron cryo-
> > Microscopy
> > > [[log in to unmask]] on behalf of Dr. Clara Cai [marketing@SING
> > LEPA
> > > RTICLE.COM]
> > > Sent: Friday, May 05, 2017 3:21 PM
> > > To: [log in to unmask]
> > > Subject: Re: [ccpem] Relion - Tests on a 8 GPU node
> > >
> > > Dear Weiwei
> > >
> > > The attached paper is from 2002. Intel Hyper-threading and Turbo
> > > Boost work quite efficiently as long as there is no
> > oversubscribing.
> > > We have benchmarked RELION2 with and without HT, and saw very
> > close
> > > results. We'd argue that there is no point in disabling HT as
> > long as
> > > you understand that the system has the HT turned on and some of
> > the
> > > cores are virtual cores. 
> > >
> > > As to your question about what CPUs to choose, as long as you
> > have at
> > > least 2 physical cores per GPU, the decision is really about your
> > > budget. With the CPU pricing model, you will need to pay a lot
> > for
> > > the extra 5-10% performance for higher-end models, and with most
> > > processing on GPUs, you will see a minimal boost in overall
> > RELION
> > > performance. 
> > >
> > > Best regards, 
> > >
> > > -Clara
> > >
> > > Dr. Clara Cai
> > > SingleParticle.com
> > > Turnkey GPU workstations/clusters for cryoEM
> > >
> > > On Fri, May 5, 2017 at 10:01 AM, Weiwei Wang <[log in to unmask]
> > elle
> > > r.edu> wrote:
> > > > Hi All,
> > > >
> > > > I noticed that hyper-threading is enabled in Nicolas'
> > > > configuration. Attached is a discussion on the use of hyper-
> > > > threading in HPC (seems done by Dell, from google). I wonder,
> > in
> > > > case of running Relion2 with GPUs, if hyper-threading
> > would makes
> > > > any difference, maybe in optimization of
> > CPU/GPU calculation/data
> > > > transfer? And a related question, how much CPU power is
> > optimal to
> > > > not be a bottle neck when running with 8 fast GPUs like Titan
> > or
> > > > 1080s (some double float calculations are still performed with
> > CPU
> > > > right?).  Thanks a lot for any suggestions!
> > > >
> > > > Best,
> > > > Weiwei
> > > >
> > > > From: Collaborative Computational Project in Electron cryo-
> > > > Microscopy <[log in to unmask]> on behalf of Ali Siavosh-
> > Haghighi
> > > > <[log in to unmask]>
> > > > Sent: Friday, May 5, 2017 12:01 PM
> > > > To: [log in to unmask]
> > > >
> > > > Subject: Re: [ccpem] Relion - Tests on a 8 GPU node
> > > >  
> > > > Hi All,
> > > > At the same time I should add that all the memory on cards
> > fills up
> > > > (11GB per card; either for 8-GPU or 4-GPU assignments).
> > > > ===========================================
> > > > Ali  Siavosh-Haghighi, Ph.D.
> > > > HPC System Administrator
> > > > High Performance Computing Facility
> > > > Medical Center Information Technologies
> > > > NYU Langone Medical Center
> > > > Phone: (646) 501-2907
> > > >  http://wiki.hpc.med.nyu.edu/
> > > > ===========================================
> > > >
> > > >
> > > > > On May 5, 2017, at 11:36 AM, Coudray, Nicolas <Nicolas.Coudra
> > y@ny
> > > > > umc.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thank you all for your feedbacks!
> > > > >
> > > > > We will let you the know the results on the benchmark dataset
> > > > > asap.
> > > > >
> > > > >
> > > > > Regarding the number of MPIs used, there was indeed a typo
> > and I
> > > > > did use "8GPUs, 9MPIs and 6 threads" in the last run of each
> > job
> > > > > (except auto-picking where I did use 8 MPIs).
> > > > >
> > > > > As for the mapping, this what I did:
> > > > > for 2 GPUs, 3MPIs, 24 threads: -gpu "0:1"
> > > > > for 4 GPUs, 3MPIs, 24 threads: -gpu "0,1:2,3"
> > > > > for 4 GPUs, 5MPIs, 24 threads: -gpu "0:1:2:3"
> > > > > for 8 GPUs:                             -gpu ""
> > > > >
> > > > > So for all the tests with 8 GPUs, I did let Relion figure it
> > out,
> > > > > so the loss of performance when going from 5 to 9 MPIs on 8
> > GPUs
> > > > > is puzzling to me.
> > > > >
> > > > >
> > > > > We also noticed that for some jobs GPUs are not reaching 100%
> > > > > utilization. Regarding this, Bharat, you mentioned that you
> > often
> > > > > run 2 mpi processes with at least 2 threads per GPU. Does it
> > only
> > > > > increased the % GPU utilization, or do you also see a
> > consequent
> > > > > speed improvement?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > BTW, regarding the CPUs we've been using, these are the
> > > > > specifications:
> > > > > CPU(s):                48
> > > > > On-line CPU(s) list:   0-47
> > > > > Thread(s) per core:    2
> > > > > Core(s) per socket:    12
> > > > > Socket(s):             2
> > > > > NUMA node(s):          2
> > > > > Vendor ID:             GenuineIntel
> > > > > CPU family:            6
> > > > > Model:                 79
> > > > > Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @
> > 2.20GHz
> > > > > Stepping:              1
> > > > > CPU MHz:               2499.921
> > > > > BogoMIPS:              4405.38
> > > > > NUMA node0 CPU(s): 0-11,24-35
> > > > > NUMA node1 CPU(s): 12-23,36-47
> > > > >
> > > > > Thanks,
> > > > > Best
> > > > > Nicolas
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ________________________________________
> > > > > From: Bjoern Forsberg [[log in to unmask]]
> > > > > Sent: Friday, May 05, 2017 4:51 AM
> > > > > To: Coudray, Nicolas; [log in to unmask]
> > > > > Subject: Re: [ccpem] Relion - Tests on a 8 GPU node
> > > > >
> > > > > Hi Nicolas,
> > > > >
> > > > > Just to clarify, is 8 MPIs a typo? I see you ran 3 and 5 MPIs
> > on
> > > > > some
> > > > > runs with 2 and 4 GPUs, presumably to accommodate the master
> > > > > rank. So I
> > > > > would expect you ran 9 MPIs on some of the 8 GPU-runs, right?
> > One
> > > > > reason
> > > > > I'm asking is that I would expect performance to increase
> > between
> > > > > these
> > > > > runs, e.g.;
> > > > >
> > > > > 8 GPUs, 5 MPIs, 12 threads:   4h15
> > > > > 8 GPUs, 8 MPIs,  6 threads:   8h47
> > > > >
> > > > > But you see a detrimental loss of performance. If you did
> > > > > actually run 8
> > > > > MPIs, that might be why. Running
> > > > >
> > > > > 8 GPUs, 9 MPIs,  6 threads:   ?
> > > > >
> > > > > should be interesting, an more relevant for performance, in
> > that
> > > > > case.
> > > > >
> > > > > Also, did you specify --gpu without any device numbers in all
> > > > > cases? If
> > > > > you did specify GPU indices, performance is fairly sensitive
> > to
> > > > > how well
> > > > > you mapped the ranks and threads to GPUs. This is why we
> > > > > typically
> > > > > advise to *not* specify which GPUs to use and let relion
> > figure
> > > > > it out
> > > > > on it's own, unless you want/need to specify something in
> > > > > particular.
> > > > >
> > > > > Thanks for sharing!
> > > > >
> > > > > /Björn
> > > > >
> > > > > On 05/04/2017 10:13 PM, Coudray, Nicolas wrote:
> > > > > > Hi all,
> > > > > >
> > > > > >
> > > > > > We have been running and testing Relion 2.0 on our 8 GPU
> > nodes
> > > > > > to try to figure out the optimal parameters. We thought
> > these
> > > > > > results might be interesting to share, and we are looking
> > for
> > > > > > any suggestions / comments / similar tests that you could
> > > > > > provide.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Our configuration is:
> > > > > > 8 GPUs node, 48 slots, TITAN X (Pascal, 750 GB RAM, 11GB
> > on-
> > > > > > card), sRelion compiled with gcc 4.8.5, kermel 3.10, Centos
> > > > > > 7.3, sd hard drive of 4TB
> > > > > >
> > > > > > At first, we only varied the number of GPUs, MPIs and
> > threads,
> > > > > > leaving the other disk access options constant (No parallel
> > > > > > disc I/O, particles pre-read into the RAM, no "combine
> > > > > > iterations"). The results for each type of job are as
> > follow:
> > > > > >
> > > > > > *** 2D Classification (265k particles of 280x280 pixels, 5
> > > > > > rounds, 50 classes):
> > > > > > 2 GPUs, 3 MPIs, 24 threads: 14h04
> > > > > > 4 GPUs, 3 MPIs, 24 threads:   5h23
> > > > > > 4 GPUs, 5 MPIs, 12 threads: 13h28
> > > > > > 8 GPUs, 3 MPIs, 24 threads:   3h14
> > > > > > 8 GPUs, 5 MPIs, 12 threads:   5h10
> > > > > > 8 GPUs, 8 MPIs,   6 threads: 13h28
> > > > > >
> > > > > >
> > > > > > *** 3D Classification (226k particles  of 280x280 pixels, 5
> > > > > > rounds, 5 classes):
> > > > > > 2 GPUs, 3 MPIs, 24 threads: 15h17
> > > > > > 4 GPUs, 3 MPIs, 24 threads:   5h53
> > > > > > 4 GPUs, 5 MPIs, 12 threads:   8h11
> > > > > > 8 GPUs, 3 MPIs, 24 threads:   2h48
> > > > > > 8 GPUs, 5 MPIs, 12 threads:   3h16
> > > > > > 8 GPUs, 8 MPIs,   6 threads:   4h37
> > > > > >
> > > > > >
> > > > > > *** 3D Refinement (116k particles of 280x280 pixels):
> > > > > > 2 GPUs, 3 MPIs, 24 threads: 12h07
> > > > > > 4 GPUs, 3 MPIs, 24 threads:   4h54
> > > > > > 4 GPUs, 5 MPIs, 12 threads:   9h12
> > > > > > 8 GPUs, 3 MPIs, 24 threads:   4h57
> > > > > > 8 GPUs, 5 MPIs, 12 threads:   4h15
> > > > > > 8 GPUs, 8 MPIs,   6 threads:   8h47
> > > > > >
> > > > > >
> > > > > > *** Auto-picking (on 2600 micrographs, generating around
> > 750 k
> > > > > > particles):
> > > > > > 0 GPU , 48 threads: 79 min
> > > > > > 2 GPUs,   2 threads: 52 min
> > > > > > 2 GPUs, 48 threads: error (code 11)
> > > > > > 4 GPUs,   4 threads: 27 min
> > > > > > 4 GPUs, 48 threads: 8 min
> > > > > > 8 GPUs,   8 threads: 19 min
> > > > > > 8 GPUs, 48 threads: 12 min
> > > > > >
> > > > > >
> > > > > >
> > > > > > Does anyone have particular suggestions / feedback?
> > > > > >
> > > > > > We are using these tests to guide the expansion of our
> > > > > > centralized GPU capability and any comment is greatly
> > welcome.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Best,
> > > > > >
> > > > > > Nicolas Coudray
> > > > > > New York University
> > > > > >
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------
> > ---
> > > > > > This email message, including any attachments, is for the
> > sole
> > > > > > use of the intended recipient(s) and may contain
> > information
> > > > > > that is proprietary, confidential, and exempt from
> > disclosure
> > > > > > under applicable law. Any unauthorized review, use,
> > disclosure,
> > > > > > or distribution is prohibited. If you have received this
> > email
> > > > > > in error please notify the sender by return email and
> > delete
> > > > > > the original message. Please note, the recipient should
> > check
> > > > > > this email and any attachments for the presence of viruses.
> > The
> > > > > > organization accepts no liability for any damage caused by
> > any
> > > > > > virus transmitted by this email.
> > > > > > =================================
> > > > >  
> > > > >
> > > > > ------------------------------------------------------------
> > > > > This email message, including any attachments, is for the
> > sole
> > > > > use of the intended recipient(s) and may contain information
> > that
> > > > > is proprietary, confidential, and exempt from disclosure
> > under
> > > > > applicable law. Any unauthorized review, use, disclosure, or
> > > > > distribution is prohibited. If you have received this email
> > in
> > > > > error please notify the sender by return email and delete the
> > > > > original message. Please note, the recipient should check
> > this
> > > > > email and any attachments for the presence of viruses. The
> > > > > organization accepts no liability for any damage caused by
> > any
> > > > > virus transmitted by this email.
> > > > > =================================
> > > >
> > > >
> > >
> > >
> > > ------------------------------------------------------------
> > > This email message, including any attachments, is for the sole
> > use of
> > > the intended recipient(s) and may contain information that is
> > > proprietary, confidential, and exempt from disclosure under
> > > applicable law. Any unauthorized review, use, disclosure, or
> > > distribution is prohibited. If you have received this email in
> > error
> > > please notify the sender by return email and delete the original
> > > message. Please note, the recipient should check this email and
> > any
> > > attachments for the presence of viruses. The organization accepts
> > no
> > > liability for any damage caused by any virus transmitted by this
> > > email.
> > > =================================
> > 
> 
> 

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager