Print

Print


Dear Anna,

this is caused by a CUDA bug that only appears for some GPU models. Essentially you should be able to build a “fat binary” that has code for all (not completely obsolete) “compute capabilities”. What that means is that the binary will contain alternative code for different GPUs, so that if you have a very modern GPU with some advanced capabilities there will be code using those capabilities. But if you have an old scrappy GPU there will be code for those to, meaning that you can have a “one size fits all” binary.

The bug you are seeing is caused by this process going wrong for some GPUs, which means that the wrong code is used from inside the executable. If one has one of these GPUs one has to build a custom executable that contains only the compute capability code that matches that particular GPU. Unfortunately we cannot build executables for every possible combination. We will release the source code of eddy with release 6.0.1 which will allow people to build their own executables.

Jesper

 
> On 10 Sep 2018, at 20:15, Anna Chen <[log in to unmask]> wrote:
> 
> Hi, 
> I'm having some trouble running eddy_cuda (as part of the HCP dMRI pipeline, using the HCP 3.4.0 release,  FSL 5.0.10, NVIDIA card). The error I'm getting when using installed cuda 7.5 libraries is:
> 
> Entering EddyGpuUtils::LoadPredictionMaker
> 
> ...................Allocated GPU # 0...................
> thrust::system_error thrown in CudaVolume::common_assignment_from_newimage_vol after resize() with message: function_attributes(): after cudaFuncGetAttributes: invalid device function
> terminate called after throwing an instance of 'thrust::system::system_error'
>  what():  function_attributes(): after cudaFuncGetAttributes: invalid device function
> /data/EDresearch/Temperament_fMRI/derivatives/proc-HCP/scripts-HCP/Pipelines-3.4.0/DiffusionPreprocessing/scripts/run_eddy_new.sh: line 381: 82851 Aborted                 ${eddy_command}
> Wed Aug 29 16:32:20 PDT 2018 - run_eddy_new.sh - Completed with return value: 134
> 
> However, we also have cuda 8.0 libraries installed, and that attempt resulted in this error:
> 
> /usr/local/fsl/bin/eddy_cuda: error while loading shared libraries: libcudart.so.7.5: cannot open shared object file: No such file or directory
> Thu Aug 30 14:43:13 PDT 2018 - run_eddy_new.sh - Completed with return value: 127
> 
> I noticed patches are available for FSL 5.0.11 (https://fsl.fmrib.ox.ac.uk/fsldownloads/patches/eddy-patch-fsl-5.0.11/) -- by any chance are there any for FSL 5.0.10 and CentOS 7?
> 
> Thank you,
> Anna
> 
> ########################################################################
> 
> To unsubscribe from the FSL list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1


########################################################################

To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1