Print

Print


I have had success running bedpostx_gpu on Centos 6.5, and CUDA Driver 
v6.0, on DTI with 64 directions. The card I have is a Quadro 4000, w/ 
driver version 331.49
If you have SGE, you have to make sure to switch to runlevel 2 (or 
whichever stops X), then set SGE_ROOT=""


On 04/10/2014 10:47 AM, Satrajit Ghosh wrote:
> here are some hardware/driver info from one of our nodes:
>
> ==============NVSMI LOG==============
> Timestamp                          : Thu Apr 10 09:51:10 2014
> Driver Version                      : 319.82
>
> Attached GPUs                       : 3
> GPU 0000:08:00.0
>     Product Name                    : Tesla K20m
>     Display Mode                    : Disabled
>     Display Active                  : Disabled
>     Persistence Mode                : Enabled
>     Accounting Mode                 : Disabled
>     Accounting Mode Buffer Size     : 128
>     Driver Model
>         Current                     : N/A
>         Pending                     : N/A
>     VBIOS Version                   : 80.10.39.00.04
>     Inforom Version
>         Image Version               : 2081.0208.01.09
>         OEM Object                  : 1.1
>         ECC Object                  : 3.0
>         Power Management Object     : N/A
>     GPU Operation Mode
>         Current                     : Compute
>         Pending                     : Compute
> ----
>
>
> cheers,
>
> satra
>
> On Thu, Apr 10, 2014 at 9:19 AM, gong jinnan <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
>
>     Hi Jonathan,
>     I had run bespostx_gpu successfully only once on DWI data which
>     was composed by only 21 gradient directions. But unfortunately, it
>     was never done without error on DWI data which was composed by 60
>     gradient directions or more.
>     And interesting, I found that it crashed in the high probability
>     if I interact with the computer when it’s running bedposts_gpu. It
>     that happened in your situation? Moises helped me to check the
>     logs, and found that RAM of GPU was enough to do my job, so I am
>     wondering is that because of CentOS or CPU too?
>     Jinnan.
>     在 2014年4月10日,20:57,Jonathan Berrebi <[log in to unmask]
>     <mailto:[log in to unmask]>> 写道:
>
>>     Hi,
>>     I have the exact same error when running bedpostx_gpu on a NVIDIA
>>     tesla card with cuda 5.5. I installed it on a centos 6.5 computer
>>     though. Previously I had tested bedpostx_gpu successfully on
>>     debian wheezy after downloading from neurodebian. It took a while
>>     and a simple modification to make it work, but it worked on a
>>     laptop with debian. I start wondering if it has to do with
>>     centos. Unfortunatelly I have to have centos because of some
>>     other hardware.
>>
>>     I run cuda samples with success on the centos machine. For
>>     instance "devicequery" works. A Tesla card has no graphic output
>>     so I would have expected it to have no conflict with graphical
>>     driver. But as far as I have understood we need the nvidia
>>     graphical driver to be installed in order to use cuda. Then I
>>     assume (but I am not sure since I am no expert in graphical
>>     devices) that we should not declare the nvidia driver in
>>     xorg.conf. Is that correct? (or should I run nvidia -xconfig?).
>>
>>     Anyway the way to install cuda has slightly changed now since you
>>     don't need to download the driver before. Cuda toolkit will ask
>>     you if you want to do it. Maybe something goes wrong there.
>>
>>     I am sorry if I bring more confusion than solutions but I had the
>>     exact same error this week when I got the Tesla card and I have
>>     had many thoughts since then about what can have gone wrong.
>>
>>     Thank you,
>>
>>     Jonathan
>>
>>     ________________________________________
>>     De : FSL - FMRIB's Software Library [[log in to unmask]
>>     <mailto:[log in to unmask]>] de la part de Moises Hernandez
>>     Fernandez [[log in to unmask] <mailto:[log in to unmask]>]
>>     Envoyé : mercredi 9 avril 2014 15:09
>>     À : [log in to unmask] <mailto:[log in to unmask]>
>>     Objet : Re: [FSL] Bedpostx_gpu couldn't run.
>>
>>     It should not be a temperature problem unless your GPU has a
>>     hardware problem. Any video game increases the temperature of the
>>     GPU more than bedpostX.
>>
>>     Have you tried to run some CUDA samples from the toolkit ?
>>
>>     You can check the temperature and the memory being used every
>>     second by doing:
>>     nvidia-smi -l 1
>>     (you can do in a new terminal while Bedpostx is running).
>>
>>     Then, you can see if the process is close to the memory limit.
>>
>>     Could you share the output directory ?
>>
>>     Moises.
>
>
>