JISCMail - CCPEM Archives

Hi,

     SOLVED: The SLI switch was set. Unsetting it solved the issue!

Thanks for the help,

Dieter

------------------------------------------------------------------------
Dieter Blaas,
Max F. Perutz Laboratories
Medical University of Vienna,
Inst. Med. Biochem., Vienna Biocenter (VBC),
Dr. Bohr Gasse 9/3,
A-1030 Vienna, Austria,
Tel: 0043 1 4277 61630,
Fax: 0043 1 4277 9616,
e-mail: [log in to unmask]
------------------------------------------------------------------------

Am 22.02.2017 um 15:37 schrieb Bjoern Forsberg:
> Hi,
>
> Apologies, I notice now that you only have two PIDs, indicating 
> exactly what you initially described. Please let me know if you work 
> out why this happens, I can't reproduce it here and a have in fact 
> never seen this happen before.
>
> /Björn
>
>
> On 02/22/2017 03:20 PM, Dieter Blaas wrote:
>> Hi Björn,
>>
>>   thank you very much for the explication!
>>
>> But why, when I explicitly enter "0" under "which GPU to use":
>>
>> ###############################################
>>
>>  uniqueHost N616-DB-LSRV2 has 2 ranks.
>>  Using explicit indexing on slave 1 to assign devices  0
>>  Thread 0 on slave 1 mapped to device 0
>>  Using explicit indexing on slave 2 to assign devices  0
>>  Thread 0 on slave 2 mapped to device 0
>> Device 0 on N616-DB-LSRV2 is split between 2 slaves
>>  Estimating accuracies in the orientational assignment ...
>>
>> ##############################################
>>
>> GPU #1 is used as well and divided into 2:
>>
>> ##############################################
>>
>> +-----------------------------------------------------------------------------+ 
>>
>> | Processes: GPU Memory |
>> |  GPU       PID  Type  Process name Usage      |
>> |=============================================================================| 
>>
>> |    0      2344    G /usr/lib/xorg/Xorg                              
>> 13MiB |
>> |    0     15865    C /usr/local/bin/relion_refine_mpi              
>> 3939MiB |
>> |    0     15866    C /usr/local/bin/relion_refine_mpi              
>> 3949MiB |
>> |    1      2344    G /usr/lib/xorg/Xorg                              
>> 13MiB |
>> |    1     15865    C /usr/local/bin/relion_refine_mpi              
>> 3939MiB |
>> |    1     15866    C /usr/local/bin/relion_refine_mpi              
>> 3949MiB |
>> +-----------------------------------------------------------------------------+ 
>>
>>
>> ################################################
>>
>> I am afraid that this is a problem of hardware setup.....
>>
>> Dieter
>>
>>
>>
>> ------------------------------------------------------------------------
>> Dieter Blaas,
>> Max F. Perutz Laboratories
>> Medical University of Vienna,
>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>> Dr. Bohr Gasse 9/3,
>> A-1030 Vienna, Austria,
>> Tel: 0043 1 4277 61630,
>> Fax: 0043 1 4277 9616,
>> e-mail: [log in to unmask]
>> ------------------------------------------------------------------------
>>
>> Am 22.02.2017 um 15:09 schrieb Bjoern Forsberg:
>>> Hi Dieter,
>>>
>>> There will be initial output during the run which states exactly how 
>>> relion distributes MPI-ranks and threads. If you are running 4 ranks 
>>> there is simply no way to avoid using at least 2 ranks on at least 
>>> one GPU, because MPI is implemented with non-shared memory in mind. 
>>> This means that two MPI-ranks simply *cannot* share the same memory, 
>>> even if they are using allocations on the same physical piece of 
>>> memory. The only way to share object residing in memory between 
>>> ranks is by sending and receiving them, which is both inefficient in 
>>> itself, and entirely unfeasible for objects like class references 
>>> which get re-used so often inside relion. If you want to use more 
>>> CPUs per GPU, using more threads help. It IS less efficient to 
>>> compensate fewer MPI-ranks by increasing the number of threads, but 
>>> in your case it is the only alternative, since you are limited by 
>>> memory.
>>>
>>> Cheers,
>>>
>>> /Björn
>>>
>>>
>>> On 02/22/2017 02:53 PM, Dieter Blaas wrote:
>>>> Hi all,
>>>>
>>>>     I have 2 GPUs but whatever I enter under 'Which GPU to use' 
>>>> (nothing or '0' or 0,0 etc)' and/or 'Number of MPI Proc' (3 or 4 
>>>> and Threads 1 or 2) the GPU RAM becomes divided into two each so 
>>>> that I run out of memory. What might be the reason? This does not 
>>>> occur on a second PC configured similarly.
>>>>
>>>> Thanks for hints, Dieter
>>>>
>>>> +-----------------------------------------------------------------------------+ 
>>>>
>>>> | Processes: GPU Memory |
>>>> |  GPU       PID  Type  Process name Usage      |
>>>> |=============================================================================| 
>>>>
>>>> |    0      2344    G 
>>>> /usr/lib/xorg/Xorg                              13MiB |
>>>> |    0     15865    C /usr/local/bin/relion_refine_mpi              
>>>> 3939MiB |
>>>> |    0     15866    C /usr/local/bin/relion_refine_mpi              
>>>> 3953MiB |
>>>> |    1      2344    G 
>>>> /usr/lib/xorg/Xorg                              13MiB |
>>>> |    1     15865    C /usr/local/bin/relion_refine_mpi              
>>>> 3939MiB |
>>>> |    1     15866    C /usr/local/bin/relion_refine_mpi              
>>>> 3953MiB |
>>>> +-----------------------------------------------------------------------------+ 
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------ 
>>>>
>>>> Dieter Blaas,
>>>> Max F. Perutz Laboratories
>>>> Medical University of Vienna,
>>>> Inst. Med. Biochem., Vienna Biocenter (VBC),
>>>> Dr. Bohr Gasse 9/3,
>>>> A-1030 Vienna, Austria,
>>>> Tel: 0043 1 4277 61630,
>>>> Fax: 0043 1 4277 9616,
>>>> e-mail: [log in to unmask]
>>>> ------------------------------------------------------------------------ 
>>>>
>>>
>>
>