Hi Dieter,
That's because you are only specifying one value 0 that is used by the
first MPI slave. The other ones have no information so divide up the
available resources. If you wanted to put them all on GPU 0 then you would
need to put 0:0 for 2 MPI slaves which is 3 MPI ranks total including one
master, 0:0:0 for 4 MPI ranks, and so on.
Best wishes
James
> Hi Björn,
>
> thank you very much for the explication!
>
> But why, when I explicitly enter "0" under "which GPU to use":
>
> ###############################################
>
> uniqueHost N616-DB-LSRV2 has 2 ranks.
> Using explicit indexing on slave 1 to assign devices 0
> Thread 0 on slave 1 mapped to device 0
> Using explicit indexing on slave 2 to assign devices 0
> Thread 0 on slave 2 mapped to device 0
> Device 0 on N616-DB-LSRV2 is split between 2 slaves
> Estimating accuracies in the orientational assignment ...
>
> ##############################################
>
> GPU #1 is used as well and divided into 2:
>
> ##############################################
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU Memory |
> | GPU PID Type Process name Usage |
> |=============================================================================|
> | 0 2344 G /usr/lib/xorg/Xorg
> 13MiB |
> | 0 15865 C /usr/local/bin/relion_refine_mpi
> 3939MiB |
> | 0 15866 C /usr/local/bin/relion_refine_mpi
> 3949MiB |
> | 1 2344 G /usr/lib/xorg/Xorg
> 13MiB |
> | 1 15865 C /usr/local/bin/relion_refine_mpi
> 3939MiB |
> | 1 15866 C /usr/local/bin/relion_refine_mpi
> 3949MiB |
> +-----------------------------------------------------------------------------+
>
> ################################################
>
> I am afraid that this is a problem of hardware setup.....
>
> Dieter
>
>
>
> ------------------------------------------------------------------------
> Dieter Blaas,
> Max F. Perutz Laboratories
> Medical University of Vienna,
> Inst. Med. Biochem., Vienna Biocenter (VBC),
> Dr. Bohr Gasse 9/3,
> A-1030 Vienna, Austria,
> Tel: 0043 1 4277 61630,
> Fax: 0043 1 4277 9616,
> e-mail: [log in to unmask]
> ------------------------------------------------------------------------
>
> Am 22.02.2017 um 15:09 schrieb Bjoern Forsberg:
>> Hi Dieter,
>>
>> There will be initial output during the run which states exactly how
>> relion distributes MPI-ranks and threads. If you are running 4 ranks
>> there is simply no way to avoid using at least 2 ranks on at least one
>> GPU, because MPI is implemented with non-shared memory in mind. This
>> means that two MPI-ranks simply *cannot* share the same memory, even
>> if they are using allocations on the same physical piece of memory.
>> The only way to share object residing in memory between ranks is by
>> sending and receiving them, which is both inefficient in itself, and
>> entirely unfeasible for objects like class references which get
>> re-used so often inside relion. If you want to use more CPUs per GPU,
>> using more threads help. It IS less efficient to compensate fewer
>> MPI-ranks by increasing the number of threads, but in your case it is
>> the only alternative, since you are limited by memory.
>>
>> Cheers,
>>
>> /Björn
>>
>>
>> On 02/22/2017 02:53 PM, Dieter Blaas wrote:
>>> Hi all,
>>>
>>> I have 2 GPUs but whatever I enter under 'Which GPU to use'
>>> (nothing or '0' or 0,0 etc)' and/or 'Number of MPI Proc' (3 or 4 and
>>> Threads 1 or 2) the GPU RAM becomes divided into two each so that I
>>> run out of memory. What might be the reason? This does not occur on a
>>> second PC configured similarly.
>>>
>>> Thanks for hints, Dieter
>>>
>>> +-----------------------------------------------------------------------------+
>>>
>>> | Processes: GPU Memory |
>>> | GPU PID Type Process name Usage |
>>> |=============================================================================|
>>>
>>> | 0 2344 G /usr/lib/xorg/Xorg
>>> 13MiB |
>>> | 0 15865 C /usr/local/bin/relion_refine_mpi
>>> 3939MiB |
>>> | 0 15866 C /usr/local/bin/relion_refine_mpi
>>> 3953MiB |
>>> | 1 2344 G /usr/lib/xorg/Xorg
>>> 13MiB |
>>> | 1 15865 C /usr/local/bin/relion_refine_mpi
>>> 3939MiB |
>>> | 1 15866 C /usr/local/bin/relion_refine_mpi
>>> 3953MiB |
>>> +-----------------------------------------------------------------------------+
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>> Dieter Blaas,
>>> Max F. Perutz Laboratories
>>> Medical University of Vienna,
>>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>>> Dr. Bohr Gasse 9/3,
>>> A-1030 Vienna, Austria,
>>> Tel: 0043 1 4277 61630,
>>> Fax: 0043 1 4277 9616,
>>> e-mail: [log in to unmask]
>>> ------------------------------------------------------------------------
>>
>
|