Hello,
I am running relion 3.0.7 on a Centos 7 cluster of nodes with NVIDIA 1080 GPUs. Cuda version is 10.1 and openmpi 3.1.0, slurm scheduler. If a job crashes, one or more nodes go down in sinfo, and report NVIDIA driver/library mismatch. Reinstalling cuda with subsequent slurmd restart brings them back to life, until the next instance. Please advise...
Thank you,
Yehuda
########################################################################
To unsubscribe from the CCPEM list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCPEM&A=1
|