Print

Print


Thanks for the response, Jesper. 

Could you tell me what files/folders should be in a ‘cuda directory’ for eddy_cuda to run? Some earlier problems I had involved locating/loading shared libraries (e.g., “eddy_cuda8.0: error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or directory”), since the libraries/files were kinda scattered around the instance’s computer. 

I got around that error by making a ‘cuda directory’ in the same directory as the data I’m running eddy on; these are its contents:

CudaVolume.cu   
eddy_cuda8.0     
EddyGpuUtils.h           
EddyKernels.h                   
GpuPredictorChunk.cu  
PostEddyCF.cu
CudaVolume.h    
eddy_cuda9.1     
EddyInternalGpuUtils.cu  
EddyMatrixKernels.cu            
GpuPredictorChunk.h   
StackResampler.cu
DiffusionGP.cu  
EddyFunctors.h   
EddyInternalGpuUtils.h   
EddyMatrixKernels.h 
StackResampler.h
eddy.cpp        
EddyGpuUtils.cu 
EddyKernels.cu           
eddy_matrix_kernels_internal.h 
LSResampler.cu

I also have a ‘lib’ folder within cuda folder that houses:

libcudart.so.7.0  
libcudart.so.7.0.28 
libcudart.so.8.0  
libcudart.so.8.0.44 
libcudart.so.9.1 
libcudart.so.9.1.85

Is there anything I’m missing? Any advice is greatly appreciated.

paul

On 6/13/19, 6:41 AM, "FSL - FMRIB's Software Library on behalf of Jesper Andersson" <[log in to unmask] on behalf of [log in to unmask]> wrote:

    Dear Paul,
    
    > I've been trying to run eddy on a GPU so that I may perform slice-to-volume correction. I'm working on NITRC's computing environment using an Amazon Web Services (AWS) g3.4xlarge instance (supports CUDA 8.0) and running the following from the command line:
    > 
    > eddy_cuda8.0 --imain=dwi_raw_RL.nii --mask=corrected_b0_brain_mask --acqp=b0_parameters.txt --index=index.txt --bvecs=bvecs --bvals=bvals --topup=topup_results --mporder=6 --slspec=slice_order.txt --out=eddy --verbose
    > 
    > 
    > However, I keep getting a persistent error message during the 'Register' step:
    > 
    > 
    > Reading images
    > Performing volume-to-volume registration
    > Running Register
    > EDDY:::  EddyGpuUtils::InitGpu: cudeGetDevice returned an unknown error code
    > EDDY:::  cuda/EddyGpuUtils.cu:::  static void EDDY::EddyGpuUtils::InitGpu(bool):  Exception thrown
    > 
    > EDDY:::  eddy.cpp:::  EDDY::ReplacementManager* EDDY::Register(const EDDY::EddyCommandLineOptions&, EDDY::ScanType, unsigned int, const std::vector<float, std::allocator<float> >&, EDDY::SecondLevelECModel, bool, EDDY::ECScanManager&, EDDY::ReplacementManager*, NEWMAT::Matrix&, NEWMAT::Matrix&):  Exception thrown
    > EDDY::: Eddy failed with message EDDY:::  eddy.cpp:::  EDDY::ReplacementManager* EDDY::DoVolumeToVolumeRegistration(const EDDY::EddyCommandLineOptions&, EDDY::ECScanManager&):  Exception thrown
    > 
    > 
    > So the CUDA program appears to be executed up to a point, but the error message has me stumped as to what is causing the problem. Any insights/advice/thoughts would be greatly appreciated!
    
    I am afraid I am equally stumped by that message. The message comes from this, very simple, bit of code
    
    void EddyGpuUtils::InitGpu(bool verbose) EddyTry
    {
      static bool initialized=false;
      if (!initialized) {
        initialized=true;
        int device;
        cudaError_t ce;
        if ((ce = cudaGetDevice(&device)) != cudaSuccess) {
          if (ce == cudaErrorInvalidValue) throw EddyException("EddyGpuUtils::InitGpu: cudeGetDevice returned an error code cudaErrorInvalidValue");
          else throw EddyException("EddyGpuUtils::InitGpu: cudeGetDevice returned an unknown error code");
        }    
        if (verbose) printf("\n...................Allocated GPU # %d...................\n", device); 
        int *q;
        if ((ce = cudaMalloc((void **)&q, sizeof(int))) != cudaSuccess) {
          throw EddyException("EddyGpuUtils::InitGpu: cudeMalloc returned an error when trying to allocate device memory");
        }
        cudaFree(q);
        EddyKernels::CudaSync("EddyGpuUtils::InitGpu");
      }
    } EddyCatch
    
    This routine is called by eddy before attempting to use a GPU. It just ensures that there is a GPU available for it, and also attempts to allocate a tiny snippet of on-chip memory to check that that works. It is particularly vexing that it returns an error code that is not cudaErrorInvalidValue, since my understanding is that it is the only possible error code from cudaGetDevice.
    
    I have seen this error a few times myself, and I have always been able to resolve it by re-booting the GPU. But that is clearly not an option if your are using Amazon services.
    
    Maybe someone who reads this has more experience of CUDA programming and can suggest something?
    
    Jesper
    
    
    > 
    > Best,
    > 
    > Paul Sands
    > 
    > ########################################################################
    > 
    > To unsubscribe from the FSL list, click the following link:
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DFSL-26A-3D1&d=DwIFAg&c=yzGiX0CSJAqkDTmENO9LmP6KfPQitNABR9M66gsTb5w&r=f7Ws0BuIClE4RXnuoSlqec9b9UU5SAmT_h-w7ptMULI&m=c9Hl6s_ma-lPfOVoJ1Y-OtS4l-nKeJobZUF-Kzig4R4&s=D6gHkCnmgeuBNbRQ844dg6uT5chT2elgluLZXzbOh0k&e= 
    
    ########################################################################
    
    To unsubscribe from the FSL list, click the following link:
    https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DFSL-26A-3D1&d=DwIFAg&c=yzGiX0CSJAqkDTmENO9LmP6KfPQitNABR9M66gsTb5w&r=f7Ws0BuIClE4RXnuoSlqec9b9UU5SAmT_h-w7ptMULI&m=c9Hl6s_ma-lPfOVoJ1Y-OtS4l-nKeJobZUF-Kzig4R4&s=D6gHkCnmgeuBNbRQ844dg6uT5chT2elgluLZXzbOh0k&e= 
    


########################################################################

To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1