Hi Björn,
thank you very much for the explication!
But why, when I explicitly enter "0" under "which GPU to use":
###############################################
uniqueHost N616-DB-LSRV2 has 2 ranks.
Using explicit indexing on slave 1 to assign devices 0
Thread 0 on slave 1 mapped to device 0
Using explicit indexing on slave 2 to assign devices 0
Thread 0 on slave 2 mapped to device 0
Device 0 on N616-DB-LSRV2 is split between 2 slaves
Estimating accuracies in the orientational assignment ...
##############################################
GPU #1 is used as well and divided into 2:
##############################################
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2344 G /usr/lib/xorg/Xorg
13MiB |
| 0 15865 C /usr/local/bin/relion_refine_mpi
3939MiB |
| 0 15866 C /usr/local/bin/relion_refine_mpi
3949MiB |
| 1 2344 G /usr/lib/xorg/Xorg
13MiB |
| 1 15865 C /usr/local/bin/relion_refine_mpi
3939MiB |
| 1 15866 C /usr/local/bin/relion_refine_mpi
3949MiB |
+-----------------------------------------------------------------------------+
################################################
I am afraid that this is a problem of hardware setup.....
Dieter
------------------------------------------------------------------------
Dieter Blaas,
Max F. Perutz Laboratories
Medical University of Vienna,
Inst. Med. Biochem., Vienna Biocenter (VBC),
Dr. Bohr Gasse 9/3,
A-1030 Vienna, Austria,
Tel: 0043 1 4277 61630,
Fax: 0043 1 4277 9616,
e-mail: [log in to unmask]
------------------------------------------------------------------------
Am 22.02.2017 um 15:09 schrieb Bjoern Forsberg:
> Hi Dieter,
>
> There will be initial output during the run which states exactly how
> relion distributes MPI-ranks and threads. If you are running 4 ranks
> there is simply no way to avoid using at least 2 ranks on at least one
> GPU, because MPI is implemented with non-shared memory in mind. This
> means that two MPI-ranks simply *cannot* share the same memory, even
> if they are using allocations on the same physical piece of memory.
> The only way to share object residing in memory between ranks is by
> sending and receiving them, which is both inefficient in itself, and
> entirely unfeasible for objects like class references which get
> re-used so often inside relion. If you want to use more CPUs per GPU,
> using more threads help. It IS less efficient to compensate fewer
> MPI-ranks by increasing the number of threads, but in your case it is
> the only alternative, since you are limited by memory.
>
> Cheers,
>
> /Björn
>
>
> On 02/22/2017 02:53 PM, Dieter Blaas wrote:
>> Hi all,
>>
>> I have 2 GPUs but whatever I enter under 'Which GPU to use'
>> (nothing or '0' or 0,0 etc)' and/or 'Number of MPI Proc' (3 or 4 and
>> Threads 1 or 2) the GPU RAM becomes divided into two each so that I
>> run out of memory. What might be the reason? This does not occur on a
>> second PC configured similarly.
>>
>> Thanks for hints, Dieter
>>
>> +-----------------------------------------------------------------------------+
>>
>> | Processes: GPU Memory |
>> | GPU PID Type Process name Usage |
>> |=============================================================================|
>>
>> | 0 2344 G /usr/lib/xorg/Xorg
>> 13MiB |
>> | 0 15865 C /usr/local/bin/relion_refine_mpi
>> 3939MiB |
>> | 0 15866 C /usr/local/bin/relion_refine_mpi
>> 3953MiB |
>> | 1 2344 G /usr/lib/xorg/Xorg
>> 13MiB |
>> | 1 15865 C /usr/local/bin/relion_refine_mpi
>> 3939MiB |
>> | 1 15866 C /usr/local/bin/relion_refine_mpi
>> 3953MiB |
>> +-----------------------------------------------------------------------------+
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>> Dieter Blaas,
>> Max F. Perutz Laboratories
>> Medical University of Vienna,
>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>> Dr. Bohr Gasse 9/3,
>> A-1030 Vienna, Austria,
>> Tel: 0043 1 4277 61630,
>> Fax: 0043 1 4277 9616,
>> e-mail: [log in to unmask]
>> ------------------------------------------------------------------------
>
|