The segfaulting by eddy_cuda appears to have been due to several
combinations of problems in with one or both the input .nii file or
the file used for --slspec. Clearly one should be doubly certain that
the number of slices reported by the nifti file and the number of
lines in the slspec file agree. In case is is of any use to anyone
else, we have now got
#!/bin/bash snippet, not for csh
# run fslinfo on the input file, get the number part of the output
# for dimension 3, which is the number of slices
slices=$(fslinfo $nifti_file | grep ^dim3 | awk '{ print $2 }')
# count the number of lines in the fslspec file and save just the number
slspec_lines=$(wc -l $slspec_file | awk '{ print $1 }')
echo "fslinfo says the image file has $slices slices"
echo "the slspec file has $slspec_lines lines in it"
# check to make sure the two numbers agree; if they do, move on
# if they don't, stop right there and exit with an error number 9
if [ $slices -eq $slspec_lines ] ; then
echo "Those agree. Continuing to process data..."
else
echo "Disagreement about the number of slices. I quit."
exit 9
fi
We are running this on a compute cluster, and eddy_cuda is being
called from mrtrix3, so this checks prior to running the dwidenoise,
which takes a long time, and will reduce the time spent waiting for
the error to be evident.
Sorry for the noise on the list. I hope someone finds the snippet above useful.
On Wed, Nov 14, 2018 at 10:30 AM Bennet Fauber <[log in to unmask]> wrote:
>
> I am replying to this thread, as it seems like my problem might be
> related. If not, I can resubmit as a separate issue.
>
> I have a researcher trying to replicate an analysis that was done in
> June. They are running the same commands -- one person gave over the
> cluster setup script and I am pretty sure that only the subject ID has
> changed.
>
> Here is the command we are running with the version I got for FSL 5.0.10:
>
> [bennet@flux-build bin]$ ls -l eddy_cuda-5.0.10
> -rwxr-xr-x 1 bennet swinstaller 40525310 Apr 24 2017 eddy_cuda-5.0.10
>
> [bennet@flux-build bin]$ md5sum eddy_cuda
> afa454d92c75542924ca36313b70d36c eddy_cuda
>
> [hajenna@nyx7500 dwipreproc-tmp-FDLY57]$ eddy_cuda \
> --imain=eddy_in.nii --mask=eddy_mask.nii \
> --acqp=eddy_config.txt --index=eddy_indices.txt \
> --bvecs=bvecs --bvals=bvals --niter=8 --fwhm=10,6,4,2,0,0,0,0 \
> --repol --ol_type=both --mporder=8 --s2v_niter=8 --dont_peas \
> --ol_type=both --mporder=8 --s2v_niter=8 --slspec=slspec.txt \
> --out=dwi_post_eddy
> Entering EddyGpuUtils::LoadPredictionMaker
>
> ...................Allocated GPU # 0...................
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Entering EddyGpuUtils::LoadPredictionMaker
> Segmentation violation, Address not mapped, Offending address = (nil)
> eddy_cuda ) [0x47d73a] [
> eddy_cuda ) [0x4ab1b1] [
> eddy_cuda ) [0x49c517] [
> eddy_cuda ) [0x4096e3] [
> eddy_cuda ) [0x40ca08] [
> eddy_cuda ) [0x40db2c] [
> /lib64/libc.so.6 __libc_start_main [0x2b3504d84445]
> eddy_cuda ) [0x405f69] [
>
> I tried the version that I got for FSL 5.0.11 with this result.
>
> [bennet@flux-build bin]$ ls -l eddy-5.0.11_cuda7.5
> -rwxrwxr-x 1 bennet swinstaller 33505739 Sep 21 2017 eddy-5.0.11_cuda7.5
>
> [bennet@flux-build bin]$ md5sum eddy-5.0.11_cuda7.5
> 5e7edd5288d3b0b7834e9ca244bf7dee eddy-5.0.11_cuda7.5
>
> [hajenna@nyx7500 dwipreproc-tmp-FDLY57]$ eddy-5.0.11_cuda7.5
> --imain=eddy_in.nii --mask=eddy_mask.nii --acqp=eddy_config.txt
> --index=eddy_indices.txt --bvecs=bvecs --bvals=bvals --niter=8
> --fwhm=10,6,4,2,0,0,0,0 --repol --ol_type=both --mporder=8
> --s2v_niter=8 --dont_peas --ol_type=both --mporder=8 --s2v_niter=8
> --slspec=slspec.txt --out=dwi_post_eddy
>
> ...................Allocated GPU # 0...................
> Segmentation violation, Unknown reason, Offending address = (nil)
> eddy-5.0.11_cuda7.5 ) [0x49ff01] [?
> eddy-5.0.11_cuda7.5 ) [0x4c62b1] [?
> eddy-5.0.11_cuda7.5 ) [0x4bb8c9] [?
> eddy-5.0.11_cuda7.5 ) [0x412c0d] [?
> eddy-5.0.11_cuda7.5 ) [0x413974] [?
> eddy-5.0.11_cuda7.5 ) [0x40798e] [?
> /lib64/libc.so.6 __libc_start_main [0x2b2d00589445]
> eddy-5.0.11_cuda7.5 ) [0x40ef31] [?
>
> Here is the `nvidia-smi` output for the card in the machine (there are
> more GPUs, but I thought the rest would be redundant).
>
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 390.48 Driver Version: 390.48 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> |===============================+======================+======================|
> | 0 Tesla K20Xm On | 00000000:09:00.0 Off | 0 |
> | N/A 27C P8 16W / 235W | 0MiB / 5700MiB | 0% E. Process |
> +-------------------------------+----------------------+----------------------+
>
> We use modules, and the cuda/7.5 module is loaded, and produces
>
> [bennet@nyx7500 ~]$ nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2015 NVIDIA Corporation
> Built on Tue_Aug_11_14:27:32_CDT_2015
> Cuda compilation tools, release 7.5, V7.5.17
>
> [bennet@nyx7500 ~]$ uname -r
> 3.10.0-693.11.6.el7.x86_64
>
> If I swap with the cuda/6.5 modules, then it says it can't find libcudart.so.7.5
>
> [hajenna@nyx7500 dwipreproc-tmp-FDLY57]$ eddy_cuda --imain=eddy_in.nii
> --mask=eddy_mask.nii --acqp=eddy_config.txt --index=eddy_indices.txt
> --bvecs=bvecs --bvals=bvals --niter=8 --fwhm=10,6,4,2,0,0,0,0 --repol
> --ol_type=both --mporder=8 --s2v_niter=8 --dont_peas --ol_type=both
> --mporder=8 --s2v_niter=8 --slspec=slspec.txt --out=dwi_post_eddy
> eddy_cuda: error while loading shared libraries: libcudart.so.7.5:
> cannot open shared object file: No such file or directory
>
> So I am pretty sure we are matched with cuda libraries.
########################################################################
To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
|