Hi,
What does "ls -l /usr/lib/libcuda*" say?
Don't you have something conflicting in LD_LIBRARY_PATH?
Best regards,
Takanori Nakane
> Hi all,
> Im trying to test our new 3 GPUs RTX 6000 using relion auto-refine, but
> the
> following error is arising all the time when I try with different CUDA
> versions (8, 9.0, 9.1 and 10.1) and reinstalling relion last version every
> time. I have last driver version 430.26. Any clue about this? THanks
>
> nvidia-smi:
>
> Mon Jul 29 12:00:26 2019
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: N/A
> |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
> ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
> M. |
> |===============================+======================+======================|
> | 0 Quadro RTX 6000 Off | 00000000:02:00.0 Off |
> Off |
> | 33% 51C P0 53W / 260W | 0MiB / 24219MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Quadro RTX 6000 Off | 00000000:03:00.0 Off |
> Off |
> | 35% 60C P0 106W / 260W | 0MiB / 24220MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Quadro RTX 6000 Off | 00000000:81:00.0 Off |
> Off |
> | 38% 61C P0 1W / 260W | 0MiB / 24220MiB | 3%
> Default |
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name Usage
> |
> |=============================================================================|
> | No running processes found
> |
> +-----------------------------------------------------------------------------+
> note.txt:
> ++++ Executing new job on Mon Jul 29 11:57:13 2019
> ++++ with the following command(s):
> `which relion_refine_mpi` --o Refine3D/job002/run --auto_refine
> --split_random_halves --i
> Runs/000223_ProtRelionRefine3D/input_particles.star --ref
> Runs/000054_ProtImportVolumes/extra/import_consensus_half1_class001.mrc
> --ini_high 60 --dont_combine_weights_via_disc --no_parallel_disc_io
> --preread_images --pool 50 --pad 2 --ctf --particle_diameter 480
> --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2
> --auto_local_healpix_order 4 --offset_range 5 --offset_step 2 --sym C1
> --low_resol_join_halves 40 --norm --scale --j 1 --gpu ""
> ++++
>
> run.err:
> ERROR: CUDA driver version is insufficient for CUDA runtime version in
> /home/local/scipion/software/em/relion/src/ml_optimiser_mpi.cpp at line
> 128
> (error-code 35)
> ERROR: CUDA driver version is insufficient for CUDA runtime version in
> /home/local/scipion/software/em/relion/src/ml_optimiser_mpi.cpp at line
> 128
> (error-code 35)
> in: /home/local/scipion/software/em/relion/src/acc/cuda/cuda_settings.h,
> line 67
> in: /home/local/scipion/software/em/relion/src/acc/cuda/cuda_settings.h,
> line 67
> === Backtrace ===
> === Backtrace ===
> /usr/local/bin/relion_refine_mpi(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x6d)
> [0x44258d]
> /usr/local/bin/relion_refine_mpi() [0x44984a]
> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x2686)
> [0x455f86]
> /usr/local/bin/relion_refine_mpi(main+0x2087) [0x436c47]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fd05e393830]
> /usr/local/bin/relion_refine_mpi(_start+0x29) [0x4391e9]
> ==================
> ERROR:
>
> A GPU-function failed to execute.
>
> If this occured at the start of a run, you might have GPUs which
> are incompatible with either the data or your installation of relion.
> If you
>
> -> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
> and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
> this may happen.
>
> -> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
> at least compute 3.5. You may be trying to use a GPU older than
> this. If you have multiple generations, try specifying --gpu <X>
> with X=0. Then try X=1 in a new run, and so on. The numbering of
> GPUs may not be obvious from the driver or intuition. For a list
> of GPU compute generations, see
>
> en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
>
> -> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
> as to not require this, and may thus have unforeseen requirements
> when run in this mode. If you think it is nonetheless necessary,
> please consult the developers with this error.
>
> If this occurred at the middle or end of a run, it might be that
>
> -> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
> subject to many restrictions, and relion is written to work within
> common restraints. If you have exotic data or settings, unexpected
> configurations may occur. See also above point regarding
> double precision.
> If none of the above applies, please report the error to the relion
> developers at github.com/3dem/relion/issues
>
>
> /usr/local/bin/relion_refine_mpi(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x6d)
> [0x44258d]
> /usr/local/bin/relion_refine_mpi() [0x44984a]
> /usr/local/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x2686)
> [0x455f86]
> /usr/local/bin/relion_refine_mpi(main+0x2087) [0x436c47]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fe53029b830]
> /usr/local/bin/relion_refine_mpi(_start+0x29) [0x4391e9]
> ==================
> ERROR:
>
> A GPU-function failed to execute.
>
> If this occured at the start of a run, you might have GPUs which
> are incompatible with either the data or your installation of relion.
> If you
>
> -> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
> and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
> this may happen.
>
> -> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
> at least compute 3.5. You may be trying to use a GPU older than
> this. If you have multiple generations, try specifying --gpu <X>
> with X=0. Then try X=1 in a new run, and so on. The numbering of
> GPUs may not be obvious from the driver or intuition. For a list
> of GPU compute generations, see
>
> en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
>
> -> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
> as to not require this, and may thus have unforeseen requirements
> when run in this mode. If you think it is nonetheless necessary,
> please consult the developers with this error.
>
> If this occurred at the middle or end of a run, it might be that
>
> -> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
> subject to many restrictions, and relion is written to work within
> common restraints. If you have exotic data or settings, unexpected
> configurations may occur. See also above point regarding
> double precision.
> If none of the above applies, please report the error to the relion
> developers at github.com/3dem/relion/issues
>
>
>
> run.out:
> RELION version: 3.0.7
> Precision: BASE=double, CUDA-ACC=single
>
> + Slave 1 runs on host = tesla
> + Slave 2 runs on host = tesla
> === RELION MPI setup ===
> + Number of MPI processes = 3
> + Master (0) runs on host = tesla
> =================
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
>
> ########################################################################
>
> To unsubscribe from the CCPEM list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
>
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|