Hi Moises
When I use the GPU on the grid, it is shared between two queue slots. I'm also running on local boxes at my desk. For the benefit of the group, here are some stats. These are running the same tractography protocol, but on different subjects, so there will be a little variation due to head size and size of parcellated masks, but shouldn't be much. The protocol involves 78 seed masks in network mode with waypoints covering white matter and avoid masks in the ventricles; it outputs fdt_network_matrix but not fdt_paths.
Average times for individual jobs):
SGE cuda queue (Tesla K40m with 10 GB RAM): 1 hr 57 min
SGE general queue: 11 hr 2 min
Local PC 1 (GeForce GTX 1050 with 2 GB RAM): 1 hr 22 min
Local PC 2 (GeForce GTX 750 Ti with 2 GB RAM): 1 hr 44 min
Time to complete ten jobs, divided by ten (parallel):
SGE cuda queue: 1 hr 0 min
SGE general queue: 1 hr 38 min
Local PC 1: 1 hr 22 min (serial only)
Local PC 2: 1 hr 44 min (serial only)
As you can see, the beefy GPU on the grid does best. Perhaps it would do better running in exclusive mode rather than two jobs at once, but it's a shared resource. The general queue uses the CPU version of probtrackx2 and although it is much slower, it catches up because there are many more CPU slots available on the grid than GPU slots. The local machines are running modestly decent consumer cards and perform respectably.
I'm curious about the benchmark for the grid GPU in exclusive mode, so I'll see how amenable our sysadmin is.
Best wishes
Paul
On Mon, 18 Mar 2019 22:32:34 -0700, Moises Hernandez <[log in to unmask]> wrote:
>Hi Paul,
>it is great that you made it work.
>
>On the CPU memory all the required data is allocated from the beginning,
>this depends on your protocol: seeds, ROIs, number samples, etc...
>
>Then, tool decides how much GPU memory to use dynamically.
>It tries to run in parallel the maximum number of streamlines as possible,
>using 80% of GPU memory available (safety).
>So the amount of mem changes depending on the GPU model.
>
>Is that GPU set in exclusive mode?
>i.e. are other jobs running simultaneously on the same GPU? ... the tool
>achieves great accelerations if it uses a GPU in exclusive mode.
>
>
>
>
>On Mon, 18 Mar 2019 at 06:55, Paul Wright <
>[log in to unmask]> wrote:
>
>> Dear Moises
>>
>> An update on my problem: I have got probtrackx2_gpu to run on the SGE by
>> explicitly selecting the right version of cuda and by increasing the RAM
>> allocated to the job to 32G.
>>
>> A couple of follow-up questions:
>> 1) The GPU RAM use is close to the limit at 9456 / 11441MiB max for our
>> card. Should allowing more system RAM for the job take pressure of the GPU
>> RAM?
>> 2) My job took 40 minutes to complete, vs 20 minutes with the non-gpu
>> version of probtrackx2. Is there something I can change to improve this,
>> since it is expected to run faster?
>>
>> Some info about the job:
>> I am running in network with 78 seed masks, plus waypoints of all white
>> matter and avoid masks of the ventricles. I am not saving fdt_paths images,
>> just the fdt_network_matrix. The DWI data are in 2 mm voxels, with seed
>> masks resampled to DWI space so no transformations applied on-the-fly.
>>
>> It may be that, in this case my job will run faster using CPU than GPU,
>> since we only have a single cuda machine on the grid, but if you can think
>> of anything I can look at that might speed up the GPU job, I'll try it out.
>>
>> Best wishes
>>
>> Paul
>>
>>
>>
>> On Sat, 2 Mar 2019 21:11:23 -0800, Moises Hernandez <[log in to unmask]>
>> wrote:
>>
>> >Hi Paul,
>> >I think the jobs are using CUDA 10.0:
>> >>> Cuda compilation tools, release 10.0, V10.0.130
>> >but the latest released versions of the tool were CUDA 9.2 & CUDA 9.1 (
>> >https://users.fmrib.ox.ac.uk/~moisesf/Probtrackx_GPU/Installation.html)
>> >so what I would try is to install CUDA 9.2 on that machine and make the
>> >jobs to use that version.
>> >You can have different versions of CUDA on the same machine.
>> >
>> >
>> >On Sat, 2 Mar 2019 at 10:02, Paul Wright <
>> >[log in to unmask]> wrote:
>> >
>> >> Hi Moises
>> >>
>> >> Our sysadmin installed the version of probtrackx2_gpu that was
>> appropriate
>> >> for our cuda machine's version. I will check with him that the versions
>> are
>> >> still in sync (ie no cuda update). Assuming versioning is correct, is
>> there
>> >> anything else I can do to diagnose? It's a mysterious error, is there
>> seems
>> >> to be plenty of memory free, and I sent it a job with just two seed
>> masks,
>> >> which shouldn't take up much memory.
>> >>
>> >> Thanks
>> >> Paul
>> >>
>> >>
>> >> On Thu, 28 Feb 2019 12:05:48 -0500, Moises Hernandez <
>> [log in to unmask]>
>> >> wrote:
>> >>
>> >> >Hi Paul,
>> >> >It sounds to me like a problem related to CUDA binary version and the
>> >> >architecture of the GPUs.
>> >> >Are the GPUs different on the SGE machine?
>> >> >If yes, you may need a different CUDA version of probtrackx2_gpu. Maybe
>> >> >that one does not support the GPUs of the SGE machine
>> >> >
>> >> >Moises
>> >> >
>> >> >On Thu, 28 Feb 2019 at 07:30, Paul Wright <
>> >> >[log in to unmask]> wrote:
>> >> >
>> >> >> Dear Moises et al.
>> >> >>
>> >> >> I'm using probtrackx2_gpu to run lots of small tracking jobs. My jobs
>> >> run
>> >> >> fine on my local Ubuntu machine, with cuda etc. set up, and speed up
>> the
>> >> >> process noticably compared with probtrackx2. I want to parallelize
>> the
>> >> >> batch by sending to our Sun Grid Engine, which has a cuda machine
>> >> >> configured, but I'm getting out of memory errors. I allocated up to
>> 16
>> >> GB
>> >> >> to each job, which should be plenty given that my local machine runs
>> >> them
>> >> >> with 16 GB RAM, and the grid machine has 125 GB total. Our admin
>> checked
>> >> >> the logs, and nvidia-smi reports that the job barely used any RAM
>> (copy
>> >> >> below), so we're trying to figure out what is triggering the error on
>> >> the
>> >> >> grid but not on the local machine. (The same job runs OK using the
>> >> regular,
>> >> >> non-gpu version of probtrackx2).
>> >> >>
>> >> >> Please let me know if you can help diagnose the problem. I'm happy to
>> >> >> produce whatever logging you need if you tell me how.
>> >> >>
>> >> >> Best wishes
>> >> >>
>> >> >> Paul Wright
>> >> >>
>> >> >> Command:
>> >> >> /software/system/fsl/fsl-6.0.0/bin/probtrackx2_gpu -s
>> >> >> /data/stcog05.bedpostX/merged -m
>> >> /data/stcog05.bedpostX/nodif_brain_mask -x
>> >> >> /data/stcog05.probtrack/masksSeed.txt -V 2
>> --dir=/data/stcog05.probtrack
>> >> >> --forcedir --network
>> >> --waypoints=/data/stcog05.probtrack/masksWaypoint.txt
>> >> >> --waycond=OR --onewaycondition
>> >> >> --avoid=/data/stcog05.probtrack/masks/ventricles --opd -l
>> >> >>
>> >> >> stdout:
>> >> >> PROBTRACKX2 VERSION GPU
>> >> >> Log directory is: /data/stcog05.probtrackx
>> >> >> Running in network mode
>> >> >> Number of Seeds: 2640
>> >> >> Dimensions Network Matrix: 2 x 2
>> >> >>
>> >> >> Time Loading Data: 22 seconds
>> >> >>
>> >> >>
>> >> >> ...................Allocated GPU 0...................
>> >> >> Free memory at the beginning: 11911102464 ---- Total memory:
>> 11996954624
>> >> >> Free memory after copying masks: 11465326592 ---- Total memory:
>> >> 11996954624
>> >> >> Running 476136 streamlines in parallel using 2 STREAMS
>> >> >> Total number of streamlines: 13200000
>> >> >>
>> >> >> stderr:
>> >> >> CUDA Runtime Error: out of memory
>> >> >>
>> >> >> uname -a
>> >> >> Linux nanlnx16.iop.kcl.ac.uk 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon
>> Nov
>> >> 26
>> >> >> 12:36:06 CST 2018 x86_64 x86_64 x86_64 GNU/Linux
>> >> >>
>> >> >> hostnamectl
>> >> >> Static hostname: nanlnx16.iop.kcl.ac.uk
>> >> >> Icon name: computer
>> >> >> Machine ID: 183fb3179d0349ed8c4bdc57ca5297ff
>> >> >> Boot ID: 886d6ba0fd054eb9a3efd995f67fa6a3
>> >> >> Operating System: Scientific Linux 7.6 (Nitrogen)
>> >> >> CPE OS Name: cpe:/o:scientificlinux:scientificlinux:7.6:GA
>> >> >> Kernel: Linux 3.10.0-957.1.3.el7.x86_64
>> >> >> Architecture: x86-64
>> >> >>
>> >> >> modinfo nvidia
>> >> >> filename:
>> >> >>
>> /lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/drivers/video/nvidia.ko
>> >> >> alias: char-major-195-*
>> >> >> version: 410.79
>> >> >> supported: external
>> >> >> license: NVIDIA
>> >> >> retpoline: Y
>> >> >> rhelversion: 7.6
>> >> >> srcversion: 1283EC37DF82D5A8A902589
>> >> >> alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> >> alias: pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> >> alias: pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> >> depends: ipmi_msghandler
>> >> >> vermagic: 3.10.0-957.1.3.el7.x86_64 SMP mod_unload modversions
>> >> >> parm: NvSwitchRegDwords:NvSwitch regkey (charp)
>> >> >> parm: NVreg_Mobile:int
>> >> >> parm: NVreg_ResmanDebugLevel:int
>> >> >> parm: NVreg_RmLogonRC:int
>> >> >> parm: NVreg_ModifyDeviceFiles:int
>> >> >> parm: NVreg_DeviceFileUID:int
>> >> >> parm: NVreg_DeviceFileGID:int
>> >> >> parm: NVreg_DeviceFileMode:int
>> >> >> parm: NVreg_UpdateMemoryTypes:int
>> >> >> parm: NVreg_InitializeSystemMemoryAllocations:int
>> >> >> parm: NVreg_UsePageAttributeTable:int
>> >> >> parm: NVreg_MapRegistersEarly:int
>> >> >> parm: NVreg_RegisterForACPIEvents:int
>> >> >> parm: NVreg_CheckPCIConfigSpace:int
>> >> >> parm: NVreg_EnablePCIeGen3:int
>> >> >> parm: NVreg_EnableMSI:int
>> >> >> parm: NVreg_TCEBypassMode:int
>> >> >> parm: NVreg_UseThreadedInterrupts:int
>> >> >> parm: NVreg_EnableStreamMemOPs:int
>> >> >> parm: NVreg_EnableBacklightHandler:int
>> >> >> parm: NVreg_EnableUserNUMAManagement:int
>> >> >> parm: NVreg_MemoryPoolSize:int
>> >> >> parm: NVreg_KMallocHeapMaxSize:int
>> >> >> parm: NVreg_VMallocHeapMaxSize:int
>> >> >> parm: NVreg_IgnoreMMIOCheck:int
>> >> >> parm: NVreg_RegistryDwords:charp
>> >> >> parm: NVreg_RegistryDwordsPerDevice:charp
>> >> >> parm: NVreg_RmMsg:charp
>> >> >> parm: NVreg_GpuBlacklist:charp
>> >> >> parm: NVreg_AssignGpus:charp
>> >> >>
>> >> >> nvcc --version
>> >> >> nvcc: NVIDIA (R) Cuda compiler driver
>> >> >> Copyright (c) 2005-2018 NVIDIA Corporation
>> >> >> Built on Sat_Aug_25_21:08:01_CDT_2018
>> >> >> Cuda compilation tools, release 10.0, V10.0.130
>> >> >>
>> >> >> qacct -u k1347787 -j \* -b 201902221200 -q cuda
>> >> >> ==============================================================
>> >> >> qname cuda
>> >> >> hostname nanlnx16.iop.kcl.ac.uk
>> >> >> group image
>> >> >> owner k1347787
>> >> >> project NONE
>> >> >> department defaultdepartment
>> >> >> jobname fscon3vprobtrackx_gpu.job
>> >> >> jobnumber 4422736
>> >> >> taskid 1
>> >> >> account sge
>> >> >> priority 0
>> >> >> qsub_time Fri Feb 22 13:17:10 2019
>> >> >> start_time Fri Feb 22 13:17:16 2019
>> >> >> end_time Fri Feb 22 13:17:50 2019
>> >> >> granted_pe NONE
>> >> >> slots 1
>> >> >> failed 0
>> >> >> exit_status 0
>> >> >> ru_wallclock 34s
>> >> >> ru_utime 23.006s
>> >> >> ru_stime 5.679s
>> >> >> ru_maxrss 5.473MB
>> >> >> ru_ixrss 0.000B
>> >> >> ru_ismrss 0.000B
>> >> >> ru_idrss 0.000B
>> >> >> ru_isrss 0.000B
>> >> >> ru_minflt 1541199
>> >> >> ru_majflt 103
>> >> >> ru_nswap 0
>> >> >> ru_inblock 665408
>> >> >> ru_oublock 19016
>> >> >> ru_msgsnd 0
>> >> >> ru_msgrcv 0
>> >> >> ru_nsignals 0
>> >> >> ru_nvcsw 13074
>> >> >> ru_nivcsw 1725
>> >> >> cpu 28.685s
>> >> >> mem 26.492GBs
>> >> >> io 332.312MB
>> >> >> iow 0.000s
>> >> >> maxvmem 4.477GB
>> >> >> arid undefined
>> >> >> ar_sub_time undefined
>> >> >> category -u k1347787 -q cuda -l h_vmem=16G
>> >> >>
>> >> >>
>> ########################################################################
>> >> >>
>> >> >> To unsubscribe from the FSL list, click the following link:
>> >> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >> >>
>> >> >
>> >>
>> >########################################################################
>> >> >
>> >> >To unsubscribe from the FSL list, click the following link:
>> >> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >> >
>> >>
>> >> ########################################################################
>> >>
>> >> To unsubscribe from the FSL list, click the following link:
>> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >>
>> >
>> >########################################################################
>> >
>> >To unsubscribe from the FSL list, click the following link:
>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >
>>
>> ########################################################################
>>
>> To unsubscribe from the FSL list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>>
>
>########################################################################
>
>To unsubscribe from the FSL list, click the following link:
>https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>
########################################################################
To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
|