Hi,
Because your box size is large, you cannot run more than one MPI process
per GPU. Since you have 28 cores in total, I would use 5 MPI processes
(i.e. four actually working) with 7 threads each.
Best regards,
Takanori Nakane
> Hi Takanori,
>
> Thanks for the advice. I am now trying with 'skip padding: yes' and it's
> zipping along nicely on 4 GPUs. Do you have any suggestions as to the
> optimal number of threads for our system specs? I'm currently running 5
> MPIs and 2 threads.
>
> Many thanks
>
> Dave
>
> -------------------------------
>  
> Dr. David M. Lawson
> Department of Biological Chemistry,
> John Innes Centre,
> Norwich,
> NR4 7UH, UK.
> Tel: +44-(0)1603-450725
> Fax: +44-(0)1603-450018
> Email: [log in to unmask]
>
> -----Original Message-----
> From: Collaborative Computational Project in Electron cryo-Microscopy
> <[log in to unmask]> On Behalf Of Takanori Nakane
> Sent: 08 March 2019 16:38
> To: [log in to unmask]
> Subject: Re: [ccpem] Refine3D jobs crashing and occasionally causing
> server to become unresponsive
>
> Hi,
>
> Aren't you running out of CPU memory? Try fewer MPI processes.
>
>> it is possible to use the full image resolution and a 512 x
>> 512 pixel box and run on the GPU for all but the last cycle
>
> With 'skip padding: yes', you can process up to 1000 px boxes with 1080 Ti
> (1 MPI / GPU).
>
> Best regards,
>
> Takanori Nakane
>
>> Hi All,
>> We have several datasets collected using a magnified pixel size of
>> 1.065 Ang/pixel from icosahedral virus particles that have a diameter
>> of ~500 Ang. By down-sampling to 1.5 Ang/pixel we can use box sizes
>> under 512 x
>> 512 pixels and therefore run Refine3D jobs to completion on GPUs.
> Initial
>> processing elsewhere has yielded reconstructions approaching 3 Ang
>> resolution after postprocessing using the full image resolution and a
> 600
>> x 600 pixel box in Refine3D running on CPU only. My feeling is that
>> this ought to yield better results than down-sampling. However, when
>> we try
> to
>> reproduce this on our server, jobs crash on the last iteration and
>> sometimes cause the server to hang and require a reboot.
>> Our server specs are as follows:
>> 2x Intel Xeon CPU Broadwell E5-2680v4 (14-core) 256GB ECC DDR4-2400
>> RAM 8x NVIDIA GTX 1080Ti 480GB SSD for boot 5TB SSD 8x 10TB Enterprise
>> SAS HDD I am currently trying to optimise parameters using a small
>> subset of the dataset comprising only 1814 particles (one class from
>> Class3D). I have tried different MPI/thread combinations, pooling
>> different numbers of particles and copying particles either to RAM or
>> SSD, amongst other things.
>> Incidentally, it is possible to use the full image resolution and a
>> 512
> x
>> 512 pixel box and run on the GPU for all but the last cycle, giving
>> 8.3 Ang resolution after the Refine3D job and 7.6 Ang resolution after
>> postprocessing for this small dataset. However, I'm concerned that
>> this
> is
>> cropping the particles too closely and I'm losing information.
>> Any suggestions as to how I might complete these jobs with a 600 x 600
>> pixel box would be most welcome.
>> I attach run.out, run.err and note.txt files for a job that was the
>> only one running on the server.
>> Many thanks in advance,
>> Dave Lawson
>> -------------------------------
>> Dr. David M. Lawson
>> Department of Biological Chemistry,
>> John Innes Centre,
>> Norwich,
>> NR4 7UH, UK.
>> Tel: +44-(0)1603-450725
>> Fax: +44-(0)1603-450018
>> Email: [log in to unmask]<mailto:[log in to unmask]>
>> ######################################################################
>> ## To unsubscribe from the CCPEM list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|