Hi Rafael and the rest,

previously I had been using cmake v2 and using cmake3 the way you specified actually helped. I can now compile a version of Relion3 that should be optimized vor AVX512. This version is sofar also the fastest one I have tested.

Previous timings (so you don't have to scroll all the way down):

gcc-compiled, pool: 100, mpi: 17, j:1                 1.47h for 1. iteration

icc-compiled, pool: 100, mpi: 17, j: 1                 1.43h for 1. iteration

icc-compiled, pool: 100, mpi: 17, j: 1, --cpu       3.55h for 1. iteration

icc-compiled, pool: 1, mpi: 17, j: 1, --cpu           3.57h for 1. iteration

(with 4x1080ti on the same machine:                 12 min for the first iteration)

with cmake3:

icc AVX512, pool: 100: mpi: 17, j: 1, --cpu           1.28h for 1. iteration

icc AVX512, pool.: 100: mpi: 17, j: 1,                    1.09h for 1. iteration

For some reason --cpu still makes things slower. So something is still off and I don't really know what to do here. If anybody has been able to build Relion3 with AVX512 optimization on centOS7, it would be amazing if you could provide some feedback on how you did it and what speed gains you get.

Cheers,

Lukas

On 3/12/19 3:50 PM, Frava wrote:
[log in to unmask]">
Hi Lukas and Bjoern (and all),

I also have the vector speed issue: the legacy CPU version of relion_refine_mpi is more than 2.5x faster than the vectorized version with "--cpu" flag (compiled with "-O3 -ip -xCORE-AVX2 -restrict" on PSXE 2018 & 2019)... I'm not sure what is causing it...
I'll try re-adding the debug flags later.

@Lukas :

With CMake 3 you need to specify your MPI compiler (and maybe its flags also, just to make sure)

YOUR_OFLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "

CC="$(which icc)" \
CXX="$(which icpc)" \
cmake ..........bla...blah.................. \
       -DCMAKE_C_FLAGS="$YOUR_OFLAGS" \
       -DCMAKE_CXX_FLAGS="$YOUR_OFLAGS" \
       -DMPI_C_COMPILER=$(which mpiicc) \
       -DMPI_C_COMPILE_OPTIONS="$YOUR_OFLAGS" \
       -DMPI_CXX_COMPILER=$(which mpiicpc) \
       -DMPI_CXX_COMPILE_OPTIONS="$YOUR_OFLAGS" \
..

Cheers,
Rafael.


Le lun. 11 mars 2019 à 12:47, Bjoern Forsberg <[log in to unmask]> a écrit :

Hi,

As you say, g++ is an unwelcome addition here. You should find icc to be used, rather than g++. Try checking the initial cmake config output, it should say there which compiler gets picked up.  

You might get tempted to find the corresponding flags for gcc instead, but i would really advise that you try and get icc working, as it is clearly better than gcc at generating fast vector instructions for use on modern intel hardware, in our hands.

/Björn

On 3/11/19 12:35 PM, Lukas Kater wrote:

Hi Björn,

sorry, that was a typo. I do not see errors in cmake (see attachment) but in make. When i use these options:

-DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
-DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "

Make gives me:

Scanning dependencies of target copy_scripts
...
make[2]: *** [src/apps/CMakeFiles/relion_lib.dir/__/autopicker.cpp.o] Error 1
make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
make: *** [all] Error 2

g++ is the gnu compiler, correct? Should it not be using the intel compiler? Could that be the problem or is this expected behavior?

I tried only using -xCORE-AVX512 like this (not sure if that's the correct way to do it):

CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON -DFORCE_OWN_FLTK=ON -DCMAKE_C_FLAGS="-xCORE-AVX512" -DCMAKE_CXX_FLAGS="-xCORE-AVX512" -DCMAKE_INSTALL_PREFIX=/opt/relion/relion_3.0_stable_icc_cpu_avx512 ..

make results in the following:

Scanning dependencies of target copy_scripts
Scanning dependencies of target relion_lib
[  0%] Built target copy_scripts
...
make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
make: *** [all] Error 2

Thanks for helping!

Lukas


On 3/11/19 9:22 AM, Bjoern Forsberg wrote:

Hi,

There is nothing that sticks out like a sore thumb at first glance. To get full use of the vectorization you should absolutely use flags like -xCORE-AVX512 on hardware that supports AVX512. What kind of errors do you see from cmake when using these flags?

/Björn


On 3/8/19 6:09 PM, Lukas Kater wrote:

Dear all,

sometime we run into the problem of not having enough GPU memory for particles with larger box sizes and for multi body refinements (here even "skip padding" has been insufficient). For these cases I wanted to compile a cpu accelerated version of Relion3, but in the few tests I have done so far I do not see any speed gains when using --cpu (as opposed to a run without --gpu or --cpu), rather things get slower.

The build was done like this:

source /opt/intel/compilers_and_libraries_2018.3.222/linux/bin/compilervars.sh intel64
source /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/bin/mklvars.sh intel64
source /opt/intel/impi/2018.3.222/intel64/bin/mpivars.sh intel64
export CPATH="${CPATH:+$CPATH:}${MKLROOT}/include/fftw"

CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON -DFORCE_OWN_FLTK=ON  ..

I tried some hardware specific optimizations as mentioned in the original build instructions sent by Sjors when the Relion 3 public beta was announced on this list:

e.g.

-DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
-DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "

but this only leads to errors with cmake. My understanding is also that this is only a further improvement and should not be strictly necessary, correct?

1) Is there any obvious problem with my build?

2) I have a machine with 2x Intel(R) Xeon(R) Gold 6134 (not too many cores but supports AVX512), what kind of improvements should I expect when running without acceleration vs. with --cpu? I was expecting significant differences (i.e. >10% less run time)

Print command gives me this:

`which relion_refine_mpi` --o Refine3D/job195/run --auto_refine --split_random_halves --i Extract/job133/particles.star --ref reference.mrc --ini_high 30 --dont_combine_weights_via_disc --scratch_dir /scratch/ --pool 1 --pad 2  --ctf --ctf_corrected_ref --particle_diameter 250 --flatten_solvent --zero_mask  --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2  --low_resol_join_halves 40 --norm --scale  --j 1 --maxsig 2000 --random_seed 0 --reuse_scratch --cpu

I did some timings:

gcc-compiled, pool: 100, mpi: 16, j:1                 1.47h for 1. iteration

icc-compiled, pool: 100, mpi: 16, j: 1                 1.43h for 1. iteration

icc-compiled, pool: 100, mpi: 16, j: 1, --cpu       3.55h for 1. iteration

icc-compiled, pool: 1, mpi: 16, j: 1, --cpu           3.57h for 1. iteration

(with 4x1080ti on the same machine:                 12 min for the first iteration)

Many thanks

Lukas

--
-----------------------------------
Lukas Kater, PhD Student
QBM Student Speaker

AG Beckmann
GeneCenter / LMU Munich
Feodor-Lynen-Str. 25
81377 Munich - Germany


To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1



To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1



To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1



To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1



To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1