Print

Print


Dear Paul,

> I've been trying to run eddy on a GPU so that I may perform slice-to-volume correction. I'm working on NITRC's computing environment using an Amazon Web Services (AWS) g3.4xlarge instance (supports CUDA 8.0) and running the following from the command line:
> 
> eddy_cuda8.0 --imain=dwi_raw_RL.nii --mask=corrected_b0_brain_mask --acqp=b0_parameters.txt --index=index.txt --bvecs=bvecs --bvals=bvals --topup=topup_results --mporder=6 --slspec=slice_order.txt --out=eddy --verbose
> 
> 
> However, I keep getting a persistent error message during the 'Register' step:
> 
> 
> Reading images
> Performing volume-to-volume registration
> Running Register
> EDDY:::  EddyGpuUtils::InitGpu: cudeGetDevice returned an unknown error code
> EDDY:::  cuda/EddyGpuUtils.cu:::  static void EDDY::EddyGpuUtils::InitGpu(bool):  Exception thrown
> 
> EDDY:::  eddy.cpp:::  EDDY::ReplacementManager* EDDY::Register(const EDDY::EddyCommandLineOptions&, EDDY::ScanType, unsigned int, const std::vector<float, std::allocator<float> >&, EDDY::SecondLevelECModel, bool, EDDY::ECScanManager&, EDDY::ReplacementManager*, NEWMAT::Matrix&, NEWMAT::Matrix&):  Exception thrown
> EDDY::: Eddy failed with message EDDY:::  eddy.cpp:::  EDDY::ReplacementManager* EDDY::DoVolumeToVolumeRegistration(const EDDY::EddyCommandLineOptions&, EDDY::ECScanManager&):  Exception thrown
> 
> 
> So the CUDA program appears to be executed up to a point, but the error message has me stumped as to what is causing the problem. Any insights/advice/thoughts would be greatly appreciated!

I am afraid I am equally stumped by that message. The message comes from this, very simple, bit of code

void EddyGpuUtils::InitGpu(bool verbose) EddyTry
{
  static bool initialized=false;
  if (!initialized) {
    initialized=true;
    int device;
    cudaError_t ce;
    if ((ce = cudaGetDevice(&device)) != cudaSuccess) {
      if (ce == cudaErrorInvalidValue) throw EddyException("EddyGpuUtils::InitGpu: cudeGetDevice returned an error code cudaErrorInvalidValue");
      else throw EddyException("EddyGpuUtils::InitGpu: cudeGetDevice returned an unknown error code");
    }    
    if (verbose) printf("\n...................Allocated GPU # %d...................\n", device); 
    int *q;
    if ((ce = cudaMalloc((void **)&q, sizeof(int))) != cudaSuccess) {
      throw EddyException("EddyGpuUtils::InitGpu: cudeMalloc returned an error when trying to allocate device memory");
    }
    cudaFree(q);
    EddyKernels::CudaSync("EddyGpuUtils::InitGpu");
  }
} EddyCatch

This routine is called by eddy before attempting to use a GPU. It just ensures that there is a GPU available for it, and also attempts to allocate a tiny snippet of on-chip memory to check that that works. It is particularly vexing that it returns an error code that is not cudaErrorInvalidValue, since my understanding is that it is the only possible error code from cudaGetDevice.

I have seen this error a few times myself, and I have always been able to resolve it by re-booting the GPU. But that is clearly not an option if your are using Amazon services.

Maybe someone who reads this has more experience of CUDA programming and can suggest something?

Jesper


> 
> Best,
> 
> Paul Sands
> 
> ########################################################################
> 
> To unsubscribe from the FSL list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1

########################################################################

To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1