Hi, SOLVED: The SLI switch was set. Unsetting it solved the issue! Thanks for the help, Dieter ------------------------------------------------------------------------ Dieter Blaas, Max F. Perutz Laboratories Medical University of Vienna, Inst. Med. Biochem., Vienna Biocenter (VBC), Dr. Bohr Gasse 9/3, A-1030 Vienna, Austria, Tel: 0043 1 4277 61630, Fax: 0043 1 4277 9616, e-mail: [log in to unmask] ------------------------------------------------------------------------ Am 22.02.2017 um 15:37 schrieb Bjoern Forsberg: > Hi, > > Apologies, I notice now that you only have two PIDs, indicating > exactly what you initially described. Please let me know if you work > out why this happens, I can't reproduce it here and a have in fact > never seen this happen before. > > /Björn > > > On 02/22/2017 03:20 PM, Dieter Blaas wrote: >> Hi Björn, >> >> thank you very much for the explication! >> >> But why, when I explicitly enter "0" under "which GPU to use": >> >> ############################################### >> >> uniqueHost N616-DB-LSRV2 has 2 ranks. >> Using explicit indexing on slave 1 to assign devices 0 >> Thread 0 on slave 1 mapped to device 0 >> Using explicit indexing on slave 2 to assign devices 0 >> Thread 0 on slave 2 mapped to device 0 >> Device 0 on N616-DB-LSRV2 is split between 2 slaves >> Estimating accuracies in the orientational assignment ... >> >> ############################################## >> >> GPU #1 is used as well and divided into 2: >> >> ############################################## >> >> +-----------------------------------------------------------------------------+ >> >> | Processes: GPU Memory | >> | GPU PID Type Process name Usage | >> |=============================================================================| >> >> | 0 2344 G /usr/lib/xorg/Xorg >> 13MiB | >> | 0 15865 C /usr/local/bin/relion_refine_mpi >> 3939MiB | >> | 0 15866 C /usr/local/bin/relion_refine_mpi >> 3949MiB | >> | 1 2344 G /usr/lib/xorg/Xorg >> 13MiB | >> | 1 15865 C /usr/local/bin/relion_refine_mpi >> 3939MiB | >> | 1 15866 C /usr/local/bin/relion_refine_mpi >> 3949MiB | >> +-----------------------------------------------------------------------------+ >> >> >> ################################################ >> >> I am afraid that this is a problem of hardware setup..... >> >> Dieter >> >> >> >> ------------------------------------------------------------------------ >> Dieter Blaas, >> Max F. Perutz Laboratories >> Medical University of Vienna, >> Inst. Med. Biochem., Vienna Biocenter (VBC), >> Dr. Bohr Gasse 9/3, >> A-1030 Vienna, Austria, >> Tel: 0043 1 4277 61630, >> Fax: 0043 1 4277 9616, >> e-mail: [log in to unmask] >> ------------------------------------------------------------------------ >> >> Am 22.02.2017 um 15:09 schrieb Bjoern Forsberg: >>> Hi Dieter, >>> >>> There will be initial output during the run which states exactly how >>> relion distributes MPI-ranks and threads. If you are running 4 ranks >>> there is simply no way to avoid using at least 2 ranks on at least >>> one GPU, because MPI is implemented with non-shared memory in mind. >>> This means that two MPI-ranks simply *cannot* share the same memory, >>> even if they are using allocations on the same physical piece of >>> memory. The only way to share object residing in memory between >>> ranks is by sending and receiving them, which is both inefficient in >>> itself, and entirely unfeasible for objects like class references >>> which get re-used so often inside relion. If you want to use more >>> CPUs per GPU, using more threads help. It IS less efficient to >>> compensate fewer MPI-ranks by increasing the number of threads, but >>> in your case it is the only alternative, since you are limited by >>> memory. >>> >>> Cheers, >>> >>> /Björn >>> >>> >>> On 02/22/2017 02:53 PM, Dieter Blaas wrote: >>>> Hi all, >>>> >>>> I have 2 GPUs but whatever I enter under 'Which GPU to use' >>>> (nothing or '0' or 0,0 etc)' and/or 'Number of MPI Proc' (3 or 4 >>>> and Threads 1 or 2) the GPU RAM becomes divided into two each so >>>> that I run out of memory. What might be the reason? This does not >>>> occur on a second PC configured similarly. >>>> >>>> Thanks for hints, Dieter >>>> >>>> +-----------------------------------------------------------------------------+ >>>> >>>> | Processes: GPU Memory | >>>> | GPU PID Type Process name Usage | >>>> |=============================================================================| >>>> >>>> | 0 2344 G >>>> /usr/lib/xorg/Xorg 13MiB | >>>> | 0 15865 C /usr/local/bin/relion_refine_mpi >>>> 3939MiB | >>>> | 0 15866 C /usr/local/bin/relion_refine_mpi >>>> 3953MiB | >>>> | 1 2344 G >>>> /usr/lib/xorg/Xorg 13MiB | >>>> | 1 15865 C /usr/local/bin/relion_refine_mpi >>>> 3939MiB | >>>> | 1 15866 C /usr/local/bin/relion_refine_mpi >>>> 3953MiB | >>>> +-----------------------------------------------------------------------------+ >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> Dieter Blaas, >>>> Max F. Perutz Laboratories >>>> Medical University of Vienna, >>>> Inst. Med. Biochem., Vienna Biocenter (VBC), >>>> Dr. Bohr Gasse 9/3, >>>> A-1030 Vienna, Austria, >>>> Tel: 0043 1 4277 61630, >>>> Fax: 0043 1 4277 9616, >>>> e-mail: [log in to unmask] >>>> ------------------------------------------------------------------------ >>>> >>> >> >