Dear Finn,
that looks like a tricky problem to debug. The fact that it occurs in different parts of the code each time make it less likely that it is a simple programming bug. Have you tried building eddy from the sources on your own system?
I also wonder what would happen if you ran it without the --topup flag (just to see if it still crashes)?
Jesper
On 14 May 2019, at 08:34, Finn Lennartsson <[log in to unmask]> wrote:
Hi FSL experts,
Have an older Workstation (now running Ubuntu 18.04) equipped with a NVIDIA Tesla C2050 card (3 Gb of memory) and Cuda 8 installed. I have installed FSL 6.01.
Have some neonatal dMRI data (collected with multi-band) that I need to run eddy with slice-to-volume (s2v) correction on. I use the command:$ eddy_cuda8.0 --imain=eddy_in.nii --mask=eddy_mask.nii --acqp=eddy_config.txt --index=eddy_indices.txt --bvecs=bvecs --bvals=bvals --topup=field --slm=linear --repol --mporder=16 --s2v_niter=10 --s2v_interp=trilinear --s2v_lambda=1 --slspec=slspec.txt --out=dwi_post_eddy
My problem is intermittent terminations of eddy_cuda8.0 when it runs. Some examples:
- Early termination after "minutes" of processing:
...................Allocated GPU # 0...................
EDDY::: EddyKernels::CudaSync: CUDA error after call to EddyKernels::affine_transform_coordinates, Error message: an illegal memory access was encountered
EDDY::: cuda/EddyKernels.cu::: void EddyKernels::CudaSync(std::string): Exception thrown
terminate called after throwing an instance of 'thrust::system::system_error'
what(): cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)
- Another early termination after "minutes" of processing:
...................Allocated GPU # 0...................
EDDY::: EddyKernels::CudaSync: CUDA error after call to EddyKernels::invert_displacement_field y, Error message: an illegal memory access was encountered
EDDY::: cuda/EddyKernels.cu::: void EddyKernels::CudaSync(std::string): Exception thrown
terminate called after throwing an instance of 'thrust::system::system_error'
what(): cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)
- Late termination after "4 h" of processing:
...................Allocated GPU # 0...................
EDDY::: EddyKernels::CudaSync: CUDA error after call to QR_Kernels::QR, Error message: an illegal memory access was encountered
terminate called after throwing an instance of 'thrust::system::system_error'
what(): cudaFree in free: an illegal memory access was encountered
Aborted (core dumped)
Eddy_cuda8.0 has finished once (took around 3.5 h), and runs longer if the computer is re-booted. Cannot seem to get the problem myself (novice on GPU setups) when searching online or on the FSL mailing list. From the behavior and output could it be that the GPU memory is not cleared/flushed during the process?
The computer used to (2 years ago) run an older version of eddy_cuda that had s2v but not multi-band support. I ran under Cuda 7.5 and on CentOS7, and it worked fine then.
Help/Directions much appreciated!
CheersFinn
To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1