Dear Leo,
If each MPI node takes 30Gb, you could run multiple MPI processes per
node. Having 32 hyper-threaded cores, you could run for example run 2 MPIs
per node, each launching 16 threads. Perhaps 4 MPIs, each running 8
threads may run a bit faster. Then, you could scale up by using as many
nodes as you have in your cluster. If you have say 10 of those nodes, then
it shouldn't take 3 days for a single iteration.
HTH,
Sjors
> Dear all,
>
> We are still struggling with this - it is very frustrating that with 496
> pixel box the last maximization iteration in autorefine takes 2-3-4 days
> (and apparently nothing happens during this time, no progress output,
> though CPUs are used).
> We have plenty of CPUs (usually we use ~17 MPIs with 15 threads = 255
> threads per job) and memory (128 GB per node with 32 hyper-threaded
> cores), so there is no swapping to disk. Memory requested by Relion in the
> last iteration is about 30GB.
>
> I wonder if people could share their examples of how long this iteration
> takes on their set-up, especially with large box of about 500 pixels?
> And whether anybody resolved similar problem?
>
> Many thanks!
>
>
>>Hi Leo,
> It also puts pixels until Nyquist back into the 3D transform, so will cost
> more CPU than the other iterations.
> HTH
> Sjors
>
>
>> Hi, still an important question for us -
>> It does not look like overall I/O cluster load is a big issue and memory
>> also is not an issue.
>> What else can be done to speed up the last iteration in 3D autorefine
>> (496
>> box, 128 GB memory per node)?
>> Now it takes up to several days so we really want to do something about
>> it.
>> Apart from using more memory per image, what else is different about the
>> last 3D autorefine operation so that it is so slow?
>>
>> Many thanks!
>>
>>
>>
>> On our cluster we started to get exceedingly long times for the last
>> iteration in 3D autorefine (with large box). There is definitely enough
>> RAM so there is no swapping. Previously the same jobs run about 10X
>> faster
>> on our cluster, so I wonder if the problem is in general I/O bottlenecks
>> in the cluster.
>> Is there a lot of particle images reading in the final maximisation step
>> (takes up to a day now)?
>> Thanks!
>>
>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|