Le lun. 11 mars 2019 à 12:47, Bjoern Forsberg <[log in to unmask]> a écrit :

Hi,

As you say, g++ is an unwelcome addition here. You should find icc to be used, rather than g++. Try checking the initial cmake config output, it should say there which compiler gets picked up.

You might get tempted to find the corresponding flags for gcc instead, but i would really advise that you try and get icc working, as it is clearly better than gcc at generating fast vector instructions for use on modern intel hardware, in our hands.

/Björn

On 3/11/19 12:35 PM, Lukas Kater wrote:
Hi Björn,

sorry, that was a typo. I do not see errors in cmake (see attachment) but in make. When i use these options:

-DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
-DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "

Make gives me:

Scanning dependencies of target copy_scripts
...
make[2]: *** [src/apps/CMakeFiles/relion_lib.dir/__/autopicker.cpp.o] Error 1
make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
make: *** [all] Error 2

g++ is the gnu compiler, correct? Should it not be using the intel compiler? Could that be the problem or is this expected behavior?

I tried only using -xCORE-AVX512 like this (not sure if that's the correct way to do it):

CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON -DFORCE_OWN_FLTK=ON -DCMAKE_C_FLAGS="-xCORE-AVX512" -DCMAKE_CXX_FLAGS="-xCORE-AVX512" -DCMAKE_INSTALL_PREFIX=/opt/relion/relion_3.0_stable_icc_cpu_avx512 ..

make results in the following:
Scanning dependencies of target copy_scripts
Scanning dependencies of target relion_lib
[ 0%] Built target copy_scripts
...
make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
make: *** [all] Error 2

Thanks for helping!

Lukas

On 3/11/19 9:22 AM, Bjoern Forsberg wrote:
Hi,

There is nothing that sticks out like a sore thumb at first glance. To get full use of the vectorization you should absolutely use flags like -xCORE-AVX512 on hardware that supports AVX512. What kind of errors do you see from cmake when using these flags?

/Björn

On 3/8/19 6:09 PM, Lukas Kater wrote:
Dear all,

sometime we run into the problem of not having enough GPU memory for particles with larger box sizes and for multi body refinements (here even "skip padding" has been insufficient). For these cases I wanted to compile a cpu accelerated version of Relion3, but in the few tests I have done so far I do not see any speed gains when using --cpu (as opposed to a run without --gpu or --cpu), rather things get slower.

The build was done like this:

source /opt/intel/compilers_and_libraries_2018.3.222/linux/bin/compilervars.sh intel64
source /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/bin/mklvars.sh intel64
source /opt/intel/impi/2018.3.222/intel64/bin/mpivars.sh intel64
export CPATH="${CPATH:+$CPATH:}${MKLROOT}/include/fftw"

CC=mpiicc CXX=mpiicpc cmake -DALTCPU=ON -DFORCE_OWN_TBB=ON -DFORCE_OWN_FLTK=ON ..

I tried some hardware specific optimizations as mentioned in the original build instructions sent by Sjors when the Relion 3 public beta was announced on this list:

e.g.

-DCMAKE_C_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "
-DCMAKE_CXX_FLAGS="-O3 -ip -g -debug inline-debug-info -xCORE-AVX512 -qopt-zmm-usage=high -restrict "

but this only leads to errors with cmake. My understanding is also that this is only a further improvement and should not be strictly necessary, correct?

1) Is there any obvious problem with my build?

2) I have a machine with 2x Intel(R) Xeon(R) Gold 6134 (not too many cores but supports AVX512), what kind of improvements should I expect when running without acceleration vs. with --cpu? I was expecting significant differences (i.e. >10% less run time)

Print command gives me this:

`which relion_refine_mpi` --o Refine3D/job195/run --auto_refine --split_random_halves --i Extract/job133/particles.star --ref reference.mrc --ini_high 30 --dont_combine_weights_via_disc --scratch_dir /scratch/ --pool 1 --pad 2 --ctf --ctf_corrected_ref --particle_diameter 250 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --low_resol_join_halves 40 --norm --scale --j 1 --maxsig 2000 --random_seed 0 --reuse_scratch --cpu

I did some timings:

gcc-compiled, pool: 100, mpi: 16, j:1               1.47h for 1. iteration

icc-compiled, pool: 100, mpi: 16, j: 1               1.43h for 1. iteration

icc-compiled, pool: 100, mpi: 16, j: 1, --cpu     3.55h for 1. iteration

icc-compiled, pool: 1, mpi: 16, j: 1, --cpu           3.57h for 1. iteration

(with 4x1080ti on the same machine:                 12 min for the first iteration)

Many thanks

Lukas

--
-----------------------------------
Lukas Kater, PhD Student
QBM Student Speaker

AG Beckmann
GeneCenter / LMU Munich
Feodor-Lynen-Str. 25
81377 Munich - Germany
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1