Hi Craig,
Your suggestion worked perfectly. 'mpirun --bind-to none' let me use
the number of threads I requested relion_refine_mpi to use. From
reading the mpirun documentation, by default processes are bound to a
core. I can only guess because my Intel processors have hyperthreading,
mpirun was only running two threads per core.
I am curious how many people are aware of OpenMPI's NUMA locking
support as I suspect many people are under utilizing their hardware.
Cheers,
BR
On Thu, 2016-04-28 at 19:19 +0000, Craig Yoshioka wrote:
> If using OpenMPI compiled with NUMA locking support (likely default),
> MPI processes get bound to assigned cores automatically. This is to
> prevent context switches and cache misses. If you are invoking via
> `mpirun` I believe the correct flag is `mpirun —bind none`… or you
> can let MPI know that each process is going to use more than a single
> thread using —map-by or some other flag.
>
>
>
> >
> > On Apr 28, 2016, at 12:06 PM, Sjors Scheres <[log in to unmask]
> > .UK> wrote:
> >
> > One more thing: you could give the program A LOT to calculate for
> > each
> > particle, just as a test whether it's access to the images that is
> > somehow
> > a bottle neck. You could do this by increasing the angular sampling
> > 4-fold.
> > HTH,
> > S
> > >
> > > Hi Sjors,
> > >
> > > I agree I am being hampered by somethings. My workstation has
> > > dual 10-
> > > core Xeon processors which with hyperthreading should give me a
> > > total
> > > of 40 threads. Additionally it has 128GB of ram and a 8 harddrive
> > > raid
> > > 5 array giving over 1GB/s throughput, which according to 'top'
> > > isn't
> > > being taxed at all.
> > >
> > > An interesting observation is that when I use relion_refine, I
> > > have no
> > > problems of it running more than 2 threads (4, 8, 16, etc.
> > > verified
> > > using 'top'). However when I use the relion_refine_mpi, I am
> > > stuck at 2
> > > threads per mpi process.
> > >
> > > Some additional info:
> > > 1) I have seen this issue on two different systems running
> > > Centos7 and
> > > Fedora 23 with 20 and 4 cores respectively.
> > > 2) I have see issue with my own compiled version of relion 1.4
> > > and
> > > SBGrid.org's compiled version.
> > > 3) I suspect this might be a reason the program is not scaling as
> > > well
> > > as we would like on our cluster, a Cray XE6 system.
> > >
> > > Cheers,
> > > Bharat Reddy
> > > Post Doc
> > > University of Chicago
> > >
> > >
> > > On Thu, 2016-04-28 at 10:10 +0100, Sjors Scheres wrote:
> > > >
> > > > Hi again,
> > > > The program is actually using 4 threads (as from the stdout).
> > > > The
> > > > fact
> > > > top runs at 200% means that your threads are hampered by
> > > > something
> > > > else.
> > > > This could for example be the reading of particles from the
> > > > hard
> > > > disk,
> > > > which can become a bottle neck. Also: how many cores does
> > > > nsit-dhcp-148-090.bsd.uchicago.edu have? You're running 4 MPI
> > > > slaves,
> > > > each with 4 threads on it. The master also takes 1 core.
> > > > Therefore,
> > > > your
> > > > machine should have 17 cores to do everything you ask for. If
> > > > it has
> > > > fewer cores, then they'll just be in each others way.
> > > > HTH,
> > > > Sjors
> > > >
> > > > On 04/27/2016 08:18 PM, Baru Reddy wrote:
> > > > >
> > > > >
> > > > > Hi Sjors,
> > > > > 'top' says each mpi process is running at ~200%. This is the
> > > > > criteria by which I say it is only using 2 threads is based
> > > > > on the
> > > > > ~200%. The command I use and initial output I get is shown
> > > > > below.
> > > > >
> > > > > mpirun -n 5 ~/Downloads/relion-1.4/bin/relion_refine_mpi --o
> > > > > Class3D/run1_ct5 --continue Class3D/run1_it005_optimiser.star
> > > > > --
> > > > > iter 25 --tau2_fudge 4 --solvent_mask proteasome_mask_150.mrc
> > > > > --
> > > > > oversampling 1 --healpix_order 3 --offset_range 5 --
> > > > > offset_step 2
> > > > > --j 4 &
> > > > >
> > > > > [reddybg@nsit-dhcp-148-090 gauto]$ === RELION MPI setup ===
> > > > > + Number of MPI processes = 5
> > > > > + Number of threads per MPI process = 4
> > > > > + Total number of threads therefore = 20
> > > > > + Master (0) runs on host =
> > > > nsit-dhcp-148-
> > > > >
> > > > > 090.bsd.uchicago.edu
> > > > > + Slave 1 runs on host =
> > > > nsit-dhcp-148-
> > > > >
> > > > > 090.bsd.uchicago.edu
> > > > > + Slave 2 runs on host =
> > > > nsit-dhcp-148-
> > > > >
> > > > > 090.bsd.uchicago.edu
> > > > > + Slave 3 runs on host =
> > > > nsit-dhcp-148-
> > > > >
> > > > > 090.bsd.uchicago.edu
> > > > > + Slave 4 runs on host =
> > > > nsit-dhcp-148-
> > > > >
> > > > > 090.bsd.uchicago.edu
> > > > >
> > > > > Cheers,Bharat ReddyPost DocUniversity of Chicago
> > > > >
> > > > > From: Sjors Scheres <[log in to unmask]>
> > > > > To: Baru Reddy <[log in to unmask]>
> > > > > Cc: [log in to unmask]
> > > > > Sent: Wednesday, April 27, 2016 2:08 PM
> > > > > Subject: Re: [ccpem] Stuck at 2 Threads per MPI Process
> > > > >
> > > > > Hi Bharat,
> > > > > --j N should always launch N threads. You'll only see them as
> > > > > 1
> > > > > process in
> > > > > 'top', but it may run up to ~N00%. Why do you say relion
> > > > > launches
> > > > > only 2
> > > > > threads? How do you see this? Does it say so in the stdout?
> > > > > S
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi Everyone,
> > > > > > Currently we are trying to mobilize the power of threads as
> > > > > > our
> > > > > > refinements have become more memory intensive and we have
> > > > > > hit a
> > > > > > limit with
> > > > > > the number of MPI processes we can deploy. The problem is
> > > > > > that
> > > > > > however
> > > > > > many threads I tell relion_refine_mpi to use (-j X where X
> > > > > > is
> > > > > > 4,8,16,etc.), it only uses two threads. Is there a setting
> > > > > > I am
> > > > > > missing, a
> > > > > > variable I am failing to define, or is this a limit of
> > > > > > relion_refine_mpi .
> > > > > > Cheers,Bharat ReddyPost DocUniversity of Chicago
> >
> > --
> > Sjors Scheres
> > MRC Laboratory of Molecular Biology
> > Francis Crick Avenue, Cambridge Biomedical Campus
> > Cambridge CB2 0QH, U.K.
> > tel: +44 (0)1223 267061
> > http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|