Print

Print


Hi Rafael and the rest,

previously I had been using cmake v2 and using cmake3 the way you 
specified actually helped. I can now compile a version of Relion3 that 
should be optimized vor AVX512. This version is sofar also the fastest 
one I have tested.

Previous timings (so you don't have to scroll all the way down):

gcc-compiled, pool: 100, mpi: 17, j:1 *    1.47h for 1. iteration*

icc-compiled, pool: 100, mpi: 17, j: 1                 1.43h for 1. 
iteration

icc-compiled, pool: 100, mpi: 17, j: 1, --cpu       3.55h for 1. iteration

icc-compiled, pool: 1, mpi: 17, j: 1, --cpu           3.57h for 1. iteration

(with 4x1080ti on the same machine:                 12 min for the first 
iteration)

with cmake3:

icc AVX512, pool: 100: mpi: 17, j: 1, --cpu           1.28h for 1. iteration

icc AVX512, pool.: 100: mpi: 17, j: 1,                    1.09h for 1. 
iteration

For some reason --cpu still makes things slower. So something is still 
off and I don't really know what to do here. If anybody has been able to 
build Relion3 with AVX512 optimization on centOS7, it would be amazing 
if you could provide some feedback on how you did it and what speed 
gains you get.

Cheers,

Lukas

On 3/12/19 3:50 PM, Frava wrote:
> Hi Lukas and Bjoern (and all),
>
> I also have the vector speed issue: the legacy CPU version of 
> relion_refine_mpi is more than 2.5x faster than the vectorized version 
> with "--cpu" flag (compiled with "-O3 -ip -xCORE-AVX2 -restrict" on 
> PSXE 2018 & 2019)... I'm not sure what is causing it...
> I'll try re-adding the debug flags later.
>
> @Lukas :
>
> With CMake 3 you need to specify your MPI compiler (and maybe its 
> flags also, just to make sure)
>
> YOUR_OFLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 
> -qopt-zmm-usage=high -restrict "
>
> CC="$(which icc)" \
> CXX="$(which icpc)" \
> cmake ..........bla...blah.................. \
>        -DCMAKE_C_FLAGS="$YOUR_OFLAGS" \
>        -DCMAKE_CXX_FLAGS="$YOUR_OFLAGS" \
>        -DMPI_C_COMPILER=$(which mpiicc) \
>        -DMPI_C_COMPILE_OPTIONS="$YOUR_OFLAGS" \
>        -DMPI_CXX_COMPILER=$(which mpiicpc) \
>        -DMPI_CXX_COMPILE_OPTIONS="$YOUR_OFLAGS" \
> ..
>
> Cheers,
> Rafael.
>
>
> Le lun. 11 mars 2019 à 12:47, Bjoern Forsberg 
> <[log in to unmask] <mailto:[log in to unmask]>> a écrit :
>
>     Hi,
>
>     As you say, g++ is an unwelcome addition here. You should find icc
>     to be used, rather than g++. Try checking the initial cmake config
>     output, it should say there which compiler gets picked up.
>
>     You might get tempted to find the corresponding flags for gcc
>     instead, but i would really advise that you try and get icc
>     working, as it is clearly better than gcc at generating fast
>     vector instructions for use on modern intel hardware, in our hands.
>
>     /Björn
>
>     On 3/11/19 12:35 PM, Lukas Kater wrote:
>>
>>     Hi Björn,
>>
>>     sorry, that was a typo. I do not see errors in cmake (see
>>     attachment) but in make. When i use these options:
>>
>>     -DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info
>>     -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
>>     -DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info
>>     -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
>>
>>     Make gives me:
>>
>>     Scanning dependencies of target copy_scripts
>>     ...
>>     make[2]: ***
>>     [src/apps/CMakeFiles/relion_lib.dir/__/autopicker.cpp.o] Error 1
>>     make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
>>     make: *** [all] Error 2
>>
>>     g++ is the gnu compiler, correct? Should it not be using the
>>     intel compiler? Could that be the problem or is this expected
>>     behavior?
>>
>>     I tried only using -xCORE-AVX512 like this (not sure if that's
>>     the correct way to do it):
>>
>>     CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON
>>     -DFORCE_OWN_FLTK=ON -DCMAKE_C_FLAGS="-xCORE-AVX512"
>>     -DCMAKE_CXX_FLAGS="-xCORE-AVX512"
>>     -DCMAKE_INSTALL_PREFIX=/opt/relion/relion_3.0_stable_icc_cpu_avx512
>>     ..
>>
>>     make results in the following:
>>
>>     Scanning dependencies of target copy_scripts
>>     Scanning dependencies of target relion_lib
>>     [  0%] Built target copy_scripts
>>     ...
>>     make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
>>     make: *** [all] Error 2
>>
>>     Thanks for helping!
>>
>>     Lukas
>>
>>
>>     On 3/11/19 9:22 AM, Bjoern Forsberg wrote:
>>>
>>>     Hi,
>>>
>>>     There is nothing that sticks out like a sore thumb at first
>>>     glance. To get full use of the vectorization you should
>>>     absolutely use flags like -xCORE-AVX512 on hardware that
>>>     supports AVX512. What kind of errors do you see from cmake when
>>>     using these flags?
>>>
>>>     /Björn
>>>
>>>
>>>     On 3/8/19 6:09 PM, Lukas Kater wrote:
>>>>
>>>>     Dear all,
>>>>
>>>>     sometime we run into the problem of not having enough GPU
>>>>     memory for particles with larger box sizes and for multi body
>>>>     refinements (here even "skip padding" has been insufficient).
>>>>     For these cases I wanted to compile a cpu accelerated version
>>>>     of Relion3, but in the few tests I have done so far I do not
>>>>     see any speed gains when using --cpu (as opposed to a run
>>>>     without --gpu or --cpu), rather things get slower.
>>>>
>>>>     The build was done like this:
>>>>
>>>>     source
>>>>     /opt/intel/compilers_and_libraries_2018.3.222/linux/bin/compilervars.sh
>>>>     intel64
>>>>     source
>>>>     /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/bin/mklvars.sh
>>>>     intel64
>>>>     source /opt/intel/impi/2018.3.222/intel64/bin/mpivars.sh intel64
>>>>     export CPATH="${CPATH:+$CPATH:}${MKLROOT}/include/fftw"
>>>>
>>>>     CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON
>>>>     -DFORCE_OWN_FLTK=ON  ..
>>>>
>>>>     I tried some hardware specific optimizations as mentioned in
>>>>     the original build instructions sent by Sjors when the Relion 3
>>>>     public beta was announced on this list:
>>>>
>>>>     e.g.
>>>>
>>>>     -DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info
>>>>     -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
>>>>     -DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info
>>>>     -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
>>>>
>>>>     but this only leads to errors with cmake. My understanding is
>>>>     also that this is only a further improvement and should not be
>>>>     strictly necessary, correct?
>>>>
>>>>     1) Is there any obvious problem with my build?
>>>>
>>>>     2) I have a machine with 2x Intel(R) Xeon(R) Gold 6134 (not too
>>>>     many cores but supports AVX512), what kind of improvements
>>>>     should I expect when running without acceleration vs. with
>>>>     --cpu? I was expecting significant differences (i.e. >10% less
>>>>     run time)
>>>>
>>>>     Print command gives me this:
>>>>
>>>>     `which relion_refine_mpi` --o Refine3D/job195/run --auto_refine
>>>>     --split_random_halves --i Extract/job133/particles.star --ref
>>>>     reference.mrc --ini_high 30 --dont_combine_weights_via_disc
>>>>     --scratch_dir /scratch/ --pool 1 --pad 2  --ctf
>>>>     --ctf_corrected_ref --particle_diameter 250 --flatten_solvent
>>>>     --zero_mask  --oversampling 1 --healpix_order 2
>>>>     --auto_local_healpix_order 4 --offset_range 5 --offset_step 2
>>>>     --low_resol_join_halves 40 --norm --scale  --j 1 --maxsig 2000
>>>>     --random_seed 0 --reuse_scratch --cpu
>>>>
>>>>     I did some timings:
>>>>
>>>>     gcc-compiled, pool: 100, mpi: 16, j:1 *1.47h for 1. iteration*
>>>>
>>>>     icc-compiled, pool: 100, mpi: 16, j: 1     1.43h for 1. iteration
>>>>
>>>>     icc-compiled, pool: 100, mpi: 16, j: 1, --cpu 3.55h for 1.
>>>>     iteration
>>>>
>>>>     icc-compiled, pool: 1, mpi: 16, j: 1, --cpu           3.57h for
>>>>     1. iteration
>>>>
>>>>     (with 4x1080ti on the same machine: 12 min for the first iteration)
>>>>
>>>>     Many thanks
>>>>
>>>>     Lukas
>>>>
>>>>     -- 
>>>>     -----------------------------------
>>>>     Lukas Kater, PhD Student
>>>>     QBM Student Speaker
>>>>
>>>>     AG Beckmann
>>>>     GeneCenter / LMU Munich
>>>>     Feodor-Lynen-Str. 25
>>>>     81377 Munich - Germany
>>>>
>>>>     ------------------------------------------------------------------------
>>>>
>>>>     To unsubscribe from the CCPEM list, click the following link:
>>>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>>>
>>>
>>>     ------------------------------------------------------------------------
>>>
>>>     To unsubscribe from the CCPEM list, click the following link:
>>>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>>>
>
>     ------------------------------------------------------------------------
>
>     To unsubscribe from the CCPEM list, click the following link:
>     https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
>
> ------------------------------------------------------------------------
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>

########################################################################

To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1