Print

Print


here are some hardware/driver info from one of our nodes:

==============NVSMI LOG==============
Timestamp                          : Thu Apr 10 09:51:10 2014
Driver Version                      : 319.82

Attached GPUs                       : 3
GPU 0000:08:00.0
    Product Name                    : Tesla K20m
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    VBIOS Version                   : 80.10.39.00.04
    Inforom Version
        Image Version               : 2081.0208.01.09
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : Compute
        Pending                     : Compute
----


cheers,

satra

On Thu, Apr 10, 2014 at 9:19 AM, gong jinnan <[log in to unmask]> wrote:

> Hi Jonathan,
> I had run bespostx_gpu successfully only once on DWI data which was
> composed by only 21 gradient directions. But unfortunately, it was never
> done without error on DWI data which was composed by 60 gradient directions
> or more.
>  And interesting, I found that it crashed in the high probability if I
> interact with the computer when it’s running bedposts_gpu. It that happened
> in your situation? Moises helped me to check the logs, and found that RAM
> of GPU was enough to do my job, so I am wondering is that because of CentOS
> or CPU too?
> Jinnan.
> 在 2014年4月10日,20:57,Jonathan Berrebi <[log in to unmask]> 写道:
>
> Hi,
> I have the exact same error when running bedpostx_gpu on a NVIDIA tesla
> card with cuda 5.5. I installed it on a centos 6.5 computer though.
> Previously I had tested bedpostx_gpu successfully on debian wheezy after
> downloading from neurodebian. It took a while and a simple modification to
> make it work, but it worked on a laptop with debian. I start wondering if
> it has to do with centos. Unfortunatelly I have to have centos because of
> some other hardware.
>
> I run cuda samples with success on the centos machine. For instance
> "devicequery" works. A Tesla card has no graphic output so I would have
> expected it to have no conflict with graphical driver. But as far as I have
> understood we need the nvidia graphical driver to be installed in order to
> use cuda. Then I assume (but I am not sure since I am no expert in
> graphical devices) that we should not declare the nvidia driver in
> xorg.conf. Is that correct? (or should I run nvidia -xconfig?).
>
> Anyway the way to install cuda has slightly changed now since you don't
> need to download the driver before. Cuda toolkit will ask you if you want
> to do it. Maybe something goes wrong there.
>
> I am sorry if I bring more confusion than solutions but I had the exact
> same error this week when I got the Tesla card and I have had many thoughts
> since then about what can have gone wrong.
>
> Thank you,
>
> Jonathan
>
> ________________________________________
> De : FSL - FMRIB's Software Library [[log in to unmask]] de la part de
> Moises Hernandez Fernandez [[log in to unmask]]
> Envoyé : mercredi 9 avril 2014 15:09
> À : [log in to unmask]
> Objet : Re: [FSL] Bedpostx_gpu couldn't run.
>
> It should not be a temperature problem unless your GPU has a hardware
> problem. Any video game increases the temperature of the GPU more than
> bedpostX.
>
> Have you tried to run some CUDA samples from the toolkit ?
>
> You can check the temperature and the memory being used every second by
> doing:
> nvidia-smi -l 1
> (you can do in a new terminal while Bedpostx is running).
>
> Then, you can see if the process is close to the memory limit.
>
> Could you share the output directory ?
>
> Moises.
>
>
>
>