Thanks, Jose-Miguel!
Yes, you are correct: with the same --random_seed (and when using "Mask
individual particles with zero") the program should give the same results.
Best,
Sjors
On 06/14/2016 09:19 AM, Jose Miguel de la Rosa Trevin wrote:
> Hi Sjors,
>
> I guess that for checking performance, it would be better to use the same
> value of '--random_seed' just to make sure there optimization will follow
> the same path. Am I right? I'm not sure if the differences in execution
> time that Ian has got can be due to this or just hardware performance.
>
> Bests,
> Jose Miguel.
>
>
>
> On Tue, Jun 14, 2016 at 10:12 AM, Sjors Scheres <[log in to unmask]>
> wrote:
>
>> Hi Ian,
>> I have no experience with EC2, but you can always compare your results
>> with the precalculated ones that come with the tutorial. Sometimes it helps
>> to execute exactly the same job as the precalculated one to find problems.
>> HTH,
>> Sjors
>>
>>
>> On 06/10/2016 04:39 PM, Ian Tickle wrote:
>>
>>> Hello, I have been running some benchmarks on the Amazon Cloud using
>>> RELION
>>> and the beta-gal tutorial data. I am using the 'Cryo-EM in the Cloud' AMI
>>> from Michael Cianfrocco & Andres Leschziner. The only problem with this
>>> is
>>> it's Ubuntu 13.04 which of course is not a long-term stable release so I
>>> should upgrade to 14.04 (at least!). Jose Miguel de la Rosa Trevin has
>>> very kindly made a 14.04 AMI for us to use, but I haven't tried it yet
>>> since it doesn't have StarCluster installed.
>>>
>>> So basically I'm running exactly the same script on AWS-EC2 clusters of
>>> 'm4.10xlarge' instances (each 20 core = 40 vCPU & 160 Gb RAM), but varying
>>> only the cluster size, the number of MPI processes and the number of
>>> threads. The script I'm using is based on one from Martyn Winn, and
>>> typically looks like:
>>>
>>> mpirun -n 15 -x LD_LIBRARY_PATH -x PATH \
>>> --prefix /home/EM_Packages/openmpi -hostfile ~/hosts --map-by node \
>>> --bind-to none time `which relion_refine_mpi` --o Refine3D/Run22-m4b
>>> --auto_refine --split_random_halves --i
>>> particles_autopick_sort_class2d_class3d.star --particle_diameter 200
>>> --angpix 3.54 --ref 3i3e_lp50A.mrc --firstiter_cc --ini_high 50 --ctf
>>> --ctf_corrected_ref --flatten_solvent --zero_mask --oversampling 1
>>> --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5
>>> --offset_step 2 --sym D2 --low_resol_join_halves 40 --norm --scale --j 12
>>> --memory_per_thread 2 >Refine3D/Run18-m4a.out 2>&1
>>>
>>> The above script was run on a cluster of 5 instances (100 cores = 200
>>> vCPUs); others were run on clusters of up to 16 instances with #MPI up to
>>> 59 and #threads = 6 or 12. The problem is that even running the identical
>>> script on the identically set up cluster, sometimes after a while it goes
>>> into a mode where it uses 1 thread per process and will run for ~ 3 hours
>>> &
>>> other times where it seems to work properly it runs for typically 5 to 20
>>> mins. I always see loads of these warnings:
>>>
>>> WARNING: norm_correction= 13.2069 for particle 5801 in group 14; Are
>>> your
>>> groups large enough?
>>>
>>> I see the above in all the log files, but the number of these warnings
>>> varies enormously even from identical input data. For example the log
>>> file
>>> from the first run (which otherwise seems to have worked) contained ~ 6500
>>> of the above lines, the second run using the identical script contained ~
>>> 12800, and a third run again with identical input contained ~ 13000.
>>> Other
>>> runs have up to 65000 of these warnings! The second run seems to have
>>> failed with this error:
>>>
>>> DIRECT_A1D_ELEM(sigma2, i)= 5.53033e-37
>>> BackProjector::reconstruct: ERROR: unexpectedly small, yet non-zero sigma2
>>> value, this should not happen...a
>>> File: src/backprojector.cpp line: 867
>>> DIRECT_A1D_ELEM(sigma2, i)= 5.53033e-37
>>> BackProjector::reconstruct: ERROR: unexpectedly small, yet non-zero sigma2
>>> value, this should not happen...a
>>> File: src/backprojector.cpp line: 867
>>>
>>> It's the "this should not happen" that worries me! The second run doesn't
>>> actually crash but after a while goes into a mode where it's using only 1
>>> thread per process (100% CPU) and then will run for ~ 3 hours (unless I
>>> kill it first!).
>>>
>>> Maybe just upgrading the OS will fix this?
>>>
>>> Cheers
>>>
>>> -- Ian
>>>
>>>
>> --
>> Sjors Scheres
>> MRC Laboratory of Molecular Biology
>> Francis Crick Avenue, Cambridge Biomedical Campus
>> Cambridge CB2 0QH, U.K.
>> tel: +44 (0)1223 267061
>> http://www2.mrc-lmb.cam.ac.uk/groups/scheres
>>
--
Sjors Scheres
MRC Laboratory of Molecular Biology
Francis Crick Avenue, Cambridge Biomedical Campus
Cambridge CB2 0QH, U.K.
tel: +44 (0)1223 267061
http://www2.mrc-lmb.cam.ac.uk/groups/scheres
|