I have had success running bedpostx_gpu on Centos 6.5, and CUDA Driver v6.0, on DTI with 64 directions. The card I have is a Quadro 4000, w/ driver version 331.49
If you have SGE, you have to make sure to switch to runlevel 2 (or whichever stops X), then set SGE_ROOT=""


On 04/10/2014 10:47 AM, Satrajit Ghosh wrote:
[log in to unmask]" type="cite">
here are some hardware/driver info from one of our nodes:

==============NVSMI LOG==============
Timestamp                          : Thu Apr 10 09:51:10 2014
Driver Version                      : 319.82

Attached GPUs                       : 3
GPU 0000:08:00.0
    Product Name                    : Tesla K20m
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    VBIOS Version                   : 80.10.39.00.04
    Inforom Version
        Image Version               : 2081.0208.01.09
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : Compute
        Pending                     : Compute
----


cheers,

satra

On Thu, Apr 10, 2014 at 9:19 AM, gong jinnan <[log in to unmask]> wrote:
Hi Jonathan,
I had run bespostx_gpu successfully only once on DWI data which was composed by only 21 gradient directions. But unfortunately, it was never done without error on DWI data which was composed by 60 gradient directions or more.
And interesting, I found that it crashed in the high probability if I interact with the computer when it’s running bedposts_gpu. It that happened in your situation? Moises helped me to check the logs, and found that RAM of GPU was enough to do my job, so I am wondering is that because of CentOS or CPU too? 
Jinnan.
在 2014年4月10日,20:57,Jonathan Berrebi <[log in to unmask]> 写道:

Hi,
I have the exact same error when running bedpostx_gpu on a NVIDIA tesla card with cuda 5.5. I installed it on a centos 6.5 computer though. Previously I had tested bedpostx_gpu successfully on debian wheezy after downloading from neurodebian. It took a while and a simple modification to make it work, but it worked on a laptop with debian. I start wondering if it has to do with centos. Unfortunatelly I have to have centos because of some other hardware.

I run cuda samples with success on the centos machine. For instance "devicequery" works. A Tesla card has no graphic output so I would have expected it to have no conflict with graphical driver. But as far as I have understood we need the nvidia graphical driver to be installed in order to use cuda. Then I assume (but I am not sure since I am no expert in graphical devices) that we should not declare the nvidia driver in xorg.conf. Is that correct? (or should I run nvidia -xconfig?).

Anyway the way to install cuda has slightly changed now since you don't need to download the driver before. Cuda toolkit will ask you if you want to do it. Maybe something goes wrong there.

I am sorry if I bring more confusion than solutions but I had the exact same error this week when I got the Tesla card and I have had many thoughts since then about what can have gone wrong.

Thank you,

Jonathan

________________________________________
De : FSL - FMRIB's Software Library [[log in to unmask]] de la part de Moises Hernandez Fernandez [[log in to unmask]]
Envoyé : mercredi 9 avril 2014 15:09
À : [log in to unmask]
Objet : Re: [FSL] Bedpostx_gpu couldn't run.

It should not be a temperature problem unless your GPU has a hardware problem. Any video game increases the temperature of the GPU more than bedpostX.

Have you tried to run some CUDA samples from the toolkit ?

You can check the temperature and the memory being used every second by doing:
nvidia-smi -l 1
(you can do in a new terminal while Bedpostx is running).

Then, you can see if the process is close to the memory limit.

Could you share the output directory ?

Moises.