here are some hardware/driver info from one of our nodes: ==============NVSMI LOG============== Timestamp : Thu Apr 10 09:51:10 2014 Driver Version : 319.82 Attached GPUs : 3 GPU 0000:08:00.0 Product Name : Tesla K20m Display Mode : Disabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Disabled Accounting Mode Buffer Size : 128 Driver Model Current : N/A Pending : N/A VBIOS Version : 80.10.39.00.04 Inforom Version Image Version : 2081.0208.01.09 OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : Compute Pending : Compute ---- cheers, satra On Thu, Apr 10, 2014 at 9:19 AM, gong jinnan <[log in to unmask]> wrote: > Hi Jonathan, > I had run bespostx_gpu successfully only once on DWI data which was > composed by only 21 gradient directions. But unfortunately, it was never > done without error on DWI data which was composed by 60 gradient directions > or more. > And interesting, I found that it crashed in the high probability if I > interact with the computer when it’s running bedposts_gpu. It that happened > in your situation? Moises helped me to check the logs, and found that RAM > of GPU was enough to do my job, so I am wondering is that because of CentOS > or CPU too? > Jinnan. > 在 2014年4月10日,20:57,Jonathan Berrebi <[log in to unmask]> 写道: > > Hi, > I have the exact same error when running bedpostx_gpu on a NVIDIA tesla > card with cuda 5.5. I installed it on a centos 6.5 computer though. > Previously I had tested bedpostx_gpu successfully on debian wheezy after > downloading from neurodebian. It took a while and a simple modification to > make it work, but it worked on a laptop with debian. I start wondering if > it has to do with centos. Unfortunatelly I have to have centos because of > some other hardware. > > I run cuda samples with success on the centos machine. For instance > "devicequery" works. A Tesla card has no graphic output so I would have > expected it to have no conflict with graphical driver. But as far as I have > understood we need the nvidia graphical driver to be installed in order to > use cuda. Then I assume (but I am not sure since I am no expert in > graphical devices) that we should not declare the nvidia driver in > xorg.conf. Is that correct? (or should I run nvidia -xconfig?). > > Anyway the way to install cuda has slightly changed now since you don't > need to download the driver before. Cuda toolkit will ask you if you want > to do it. Maybe something goes wrong there. > > I am sorry if I bring more confusion than solutions but I had the exact > same error this week when I got the Tesla card and I have had many thoughts > since then about what can have gone wrong. > > Thank you, > > Jonathan > > ________________________________________ > De : FSL - FMRIB's Software Library [[log in to unmask]] de la part de > Moises Hernandez Fernandez [[log in to unmask]] > Envoyé : mercredi 9 avril 2014 15:09 > À : [log in to unmask] > Objet : Re: [FSL] Bedpostx_gpu couldn't run. > > It should not be a temperature problem unless your GPU has a hardware > problem. Any video game increases the temperature of the GPU more than > bedpostX. > > Have you tried to run some CUDA samples from the toolkit ? > > You can check the temperature and the memory being used every second by > doing: > nvidia-smi -l 1 > (you can do in a new terminal while Bedpostx is running). > > Then, you can see if the process is close to the memory limit. > > Could you share the output directory ? > > Moises. > > > >