Print

Print


Hi Jesper,

Thanks for your quick reply.

Re-ran without topup-input, but same problem

$ eddy_cuda8.0 --imain=eddy_in.nii --mask=eddy_mask.nii --acqp=eddy_config.txt --index=eddy_indices.txt --bvecs=bvecs --bvals=bvals --slm=linear --repol --mporder=16 --s2v_niter=10 --s2v_interp=trilinear --s2v_lambda=1 --slspec=slspec.txt --out=dwi_post_eddy_without_topup;

...................Allocated GPU # 0...................
EDDY:::  EddyKernels::CudaSync: CUDA error after call to EddyKernels::cubic_spline_deconvolution, Error message: an illegal memory access was encountered
EDDY:::  cuda/EddyKernels.cu:::  void EddyKernels::CudaSync(std::string):  Exception thrown
EDDY:::  cuda/CudaVolume.cu:::  void EDDY::CudaVolume::calculate_spline_coefs(const std::vector<unsigned int>&, const thrust::device_vector<float>&, thrust::device_vector<float>&) const:  Exception thrown
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)

I am now trying build eddy from sources. My FSLMACHTYPE turns out to be gnu_64-gcc5.5 and the $FSLDIR/config only contains gnu_64-gcc4.4, but chose this to be the closest match. (I am running Ubuntu 18.04.) Then added the instance about the CUDA installation in systemvars.mk (which I found in the config/linux_64-gcc.4.4/systemvars.mk) file to point to my cuda-8.0. However, no success in the process so far. I will try to get some expert IT-help to build from sources.

Cheers
Finn

On Tue, 14 May 2019 at 14:11, Jesper Andersson <[log in to unmask]> wrote:
Dear Finn,

that looks like a tricky problem to debug. The fact that it occurs in different parts of the code each time make it less likely that it is a simple programming bug. Have you tried building eddy from the sources on your own system?

I also wonder what would happen if you ran it without the --topup flag (just to see if it still crashes)?

Jesper

On 14 May 2019, at 08:34, Finn Lennartsson <[log in to unmask]> wrote:

Hi FSL experts,

Have an older Workstation (now running Ubuntu 18.04) equipped with a NVIDIA Tesla C2050 card (3 Gb of memory) and Cuda 8 installed. I have installed FSL 6.01.

Have some neonatal dMRI data (collected with multi-band) that I need to run eddy with slice-to-volume (s2v) correction on. I use the command:
$ eddy_cuda8.0 --imain=eddy_in.nii --mask=eddy_mask.nii --acqp=eddy_config.txt --index=eddy_indices.txt --bvecs=bvecs --bvals=bvals --topup=field --slm=linear --repol --mporder=16 --s2v_niter=10 --s2v_interp=trilinear --s2v_lambda=1 --slspec=slspec.txt --out=dwi_post_eddy

My problem is intermittent terminations of eddy_cuda8.0 when it runs. Some examples:

- Early termination after "minutes" of processing:

...................Allocated GPU # 0...................
EDDY:::  EddyKernels::CudaSync: CUDA error after call to EddyKernels::affine_transform_coordinates, Error message: an illegal memory access was encountered
EDDY:::  cuda/EddyKernels.cu:::  void EddyKernels::CudaSync(std::string):  Exception thrown
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)

- Another early termination after "minutes" of processing:

...................Allocated GPU # 0...................
EDDY:::  EddyKernels::CudaSync: CUDA error after call to EddyKernels::invert_displacement_field y, Error message: an illegal memory access was encountered
EDDY:::  cuda/EddyKernels.cu:::  void EddyKernels::CudaSync(std::string):  Exception thrown
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)

- Late termination after "4 h" of processing:

...................Allocated GPU # 0...................
EDDY:::  EddyKernels::CudaSync: CUDA error after call to QR_Kernels::QR, Error message: an illegal memory access was encountered
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)

Eddy_cuda8.0 has finished once (took around 3.5 h), and runs longer if the computer is re-booted. Cannot seem to get the problem myself (novice on GPU setups) when searching online or on the FSL mailing list. From the behavior and output could it be that the GPU memory is not cleared/flushed during the process?

The computer used to (2 years ago) run an older version of eddy_cuda that had s2v but not multi-band support. I ran under Cuda 7.5 and on CentOS7, and it worked fine then.

Help/Directions much appreciated!

Cheers
Finn


To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1




To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1



To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1