Hi Leo, The maximization step is FFT-limited, the expectation step is not. A maximization taking either 3 minutes or 4 hours sounds like a very big difference though.. HTH, S > Dear Sjors, > > Thank you - we tried various combinations before and the one with 2 MPIs > (15 threads each) per node gives the fastest "normal" iterations. > This set up also seems to be the best for the last iteration, although > it is difficult to be sure as it still takes 2-4 days to run (and this > is with up to 15 nodes in total per job). > > But do you think Robert MacLeod suggestion about big prime number in the > decomposition of 496 might be correct? > This would actually be consistent with the fact that in the 3D > classification run consecutive maximization iterations can run in the > pattern like this: 3 mins, 4 hours, 3 mins, 3 mins, 4 hours, etc. > And those taking long to run do have big prime number in the > decomposition of CurrentImageSize for the iteration. > Although there seems to be no strict dependence as some iterations with > big prime number in the decomposition of CurrentImageSize do run fast. > > If that is right we will try 512 box size. > What do you think? > Leo > > > > Prof. Leonid Sazanov > IST Austria > Am Campus 1 > A-3400 Klosterneuburg > Austria > > Phone: +43 2243 9000 3026 > E-mail: [log in to unmask] > Web: https://ist.ac.at/research/life-sciences/sazanov-group/ > > On 17/01/2016 16:59, Sjors Scheres wrote: >> Dear Leo, >> If each MPI node takes 30Gb, you could run multiple MPI processes per >> node. Having 32 hyper-threaded cores, you could run for example run 2 >> MPIs >> per node, each launching 16 threads. Perhaps 4 MPIs, each running 8 >> threads may run a bit faster. Then, you could scale up by using as many >> nodes as you have in your cluster. If you have say 10 of those nodes, >> then >> it shouldn't take 3 days for a single iteration. >> HTH, >> Sjors >> >> >>> Dear all, >>> >>> We are still struggling with this - it is very frustrating that with >>> 496 >>> pixel box the last maximization iteration in autorefine takes 2-3-4 >>> days >>> (and apparently nothing happens during this time, no progress output, >>> though CPUs are used). >>> We have plenty of CPUs (usually we use ~17 MPIs with 15 threads = 255 >>> threads per job) and memory (128 GB per node with 32 hyper-threaded >>> cores), so there is no swapping to disk. Memory requested by Relion in >>> the >>> last iteration is about 30GB. >>> >>> I wonder if people could share their examples of how long this >>> iteration >>> takes on their set-up, especially with large box of about 500 pixels? >>> And whether anybody resolved similar problem? >>> >>> Many thanks! >>> >>> >>>> Hi Leo, >>> It also puts pixels until Nyquist back into the 3D transform, so will >>> cost >>> more CPU than the other iterations. >>> HTH >>> Sjors >>> >>> >>>> Hi, still an important question for us - >>>> It does not look like overall I/O cluster load is a big issue and >>>> memory >>>> also is not an issue. >>>> What else can be done to speed up the last iteration in 3D autorefine >>>> (496 >>>> box, 128 GB memory per node)? >>>> Now it takes up to several days so we really want to do something >>>> about >>>> it. >>>> Apart from using more memory per image, what else is different about >>>> the >>>> last 3D autorefine operation so that it is so slow? >>>> >>>> Many thanks! >>>> >>>> >>>> >>>> On our cluster we started to get exceedingly long times for the last >>>> iteration in 3D autorefine (with large box). There is definitely >>>> enough >>>> RAM so there is no swapping. Previously the same jobs run about 10X >>>> faster >>>> on our cluster, so I wonder if the problem is in general I/O >>>> bottlenecks >>>> in the cluster. >>>> Is there a lot of particle images reading in the final maximisation >>>> step >>>> (takes up to a day now)? >>>> Thanks! >>>> >> > > -- Sjors Scheres MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge Biomedical Campus Cambridge CB2 0QH, U.K. tel: +44 (0)1223 267061 http://www2.mrc-lmb.cam.ac.uk/groups/scheres