Hi,
SOLVED: The SLI switch was set. Unsetting it solved the issue!
Thanks for the help,
Dieter
------------------------------------------------------------------------
Dieter Blaas,
Max F. Perutz Laboratories
Medical University of Vienna,
Inst. Med. Biochem., Vienna Biocenter (VBC),
Dr. Bohr Gasse 9/3,
A-1030 Vienna, Austria,
Tel: 0043 1 4277 61630,
Fax: 0043 1 4277 9616,
e-mail: [log in to unmask]
------------------------------------------------------------------------
Am 22.02.2017 um 15:37 schrieb Bjoern Forsberg:
> Hi,
>
> Apologies, I notice now that you only have two PIDs, indicating
> exactly what you initially described. Please let me know if you work
> out why this happens, I can't reproduce it here and a have in fact
> never seen this happen before.
>
> /Björn
>
>
> On 02/22/2017 03:20 PM, Dieter Blaas wrote:
>> Hi Björn,
>>
>> thank you very much for the explication!
>>
>> But why, when I explicitly enter "0" under "which GPU to use":
>>
>> ###############################################
>>
>> uniqueHost N616-DB-LSRV2 has 2 ranks.
>> Using explicit indexing on slave 1 to assign devices 0
>> Thread 0 on slave 1 mapped to device 0
>> Using explicit indexing on slave 2 to assign devices 0
>> Thread 0 on slave 2 mapped to device 0
>> Device 0 on N616-DB-LSRV2 is split between 2 slaves
>> Estimating accuracies in the orientational assignment ...
>>
>> ##############################################
>>
>> GPU #1 is used as well and divided into 2:
>>
>> ##############################################
>>
>> +-----------------------------------------------------------------------------+
>>
>> | Processes: GPU Memory |
>> | GPU PID Type Process name Usage |
>> |=============================================================================|
>>
>> | 0 2344 G /usr/lib/xorg/Xorg
>> 13MiB |
>> | 0 15865 C /usr/local/bin/relion_refine_mpi
>> 3939MiB |
>> | 0 15866 C /usr/local/bin/relion_refine_mpi
>> 3949MiB |
>> | 1 2344 G /usr/lib/xorg/Xorg
>> 13MiB |
>> | 1 15865 C /usr/local/bin/relion_refine_mpi
>> 3939MiB |
>> | 1 15866 C /usr/local/bin/relion_refine_mpi
>> 3949MiB |
>> +-----------------------------------------------------------------------------+
>>
>>
>> ################################################
>>
>> I am afraid that this is a problem of hardware setup.....
>>
>> Dieter
>>
>>
>>
>> ------------------------------------------------------------------------
>> Dieter Blaas,
>> Max F. Perutz Laboratories
>> Medical University of Vienna,
>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>> Dr. Bohr Gasse 9/3,
>> A-1030 Vienna, Austria,
>> Tel: 0043 1 4277 61630,
>> Fax: 0043 1 4277 9616,
>> e-mail: [log in to unmask]
>> ------------------------------------------------------------------------
>>
>> Am 22.02.2017 um 15:09 schrieb Bjoern Forsberg:
>>> Hi Dieter,
>>>
>>> There will be initial output during the run which states exactly how
>>> relion distributes MPI-ranks and threads. If you are running 4 ranks
>>> there is simply no way to avoid using at least 2 ranks on at least
>>> one GPU, because MPI is implemented with non-shared memory in mind.
>>> This means that two MPI-ranks simply *cannot* share the same memory,
>>> even if they are using allocations on the same physical piece of
>>> memory. The only way to share object residing in memory between
>>> ranks is by sending and receiving them, which is both inefficient in
>>> itself, and entirely unfeasible for objects like class references
>>> which get re-used so often inside relion. If you want to use more
>>> CPUs per GPU, using more threads help. It IS less efficient to
>>> compensate fewer MPI-ranks by increasing the number of threads, but
>>> in your case it is the only alternative, since you are limited by
>>> memory.
>>>
>>> Cheers,
>>>
>>> /Björn
>>>
>>>
>>> On 02/22/2017 02:53 PM, Dieter Blaas wrote:
>>>> Hi all,
>>>>
>>>> I have 2 GPUs but whatever I enter under 'Which GPU to use'
>>>> (nothing or '0' or 0,0 etc)' and/or 'Number of MPI Proc' (3 or 4
>>>> and Threads 1 or 2) the GPU RAM becomes divided into two each so
>>>> that I run out of memory. What might be the reason? This does not
>>>> occur on a second PC configured similarly.
>>>>
>>>> Thanks for hints, Dieter
>>>>
>>>> +-----------------------------------------------------------------------------+
>>>>
>>>> | Processes: GPU Memory |
>>>> | GPU PID Type Process name Usage |
>>>> |=============================================================================|
>>>>
>>>> | 0 2344 G
>>>> /usr/lib/xorg/Xorg 13MiB |
>>>> | 0 15865 C /usr/local/bin/relion_refine_mpi
>>>> 3939MiB |
>>>> | 0 15866 C /usr/local/bin/relion_refine_mpi
>>>> 3953MiB |
>>>> | 1 2344 G
>>>> /usr/lib/xorg/Xorg 13MiB |
>>>> | 1 15865 C /usr/local/bin/relion_refine_mpi
>>>> 3939MiB |
>>>> | 1 15866 C /usr/local/bin/relion_refine_mpi
>>>> 3953MiB |
>>>> +-----------------------------------------------------------------------------+
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> Dieter Blaas,
>>>> Max F. Perutz Laboratories
>>>> Medical University of Vienna,
>>>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>>>> Dr. Bohr Gasse 9/3,
>>>> A-1030 Vienna, Austria,
>>>> Tel: 0043 1 4277 61630,
>>>> Fax: 0043 1 4277 9616,
>>>> e-mail: [log in to unmask]
>>>> ------------------------------------------------------------------------
>>>>
>>>
>>
>
|