JISCMail - FSL Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
FSL Archives

FSL@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		FSL Home
		FSL March 2019
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Out of memory error using probtrackx2_gpu on Sun Grid Engine
From:
Paul Wright <[log in to unmask]>
Reply-To:
FSL - FMRIB's Software Library <[log in to unmask]>
Date:
Tue, 19 Mar 2019 11:40:31 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (352 lines)
Hi Moises

When I use the GPU on the grid, it is shared between two queue slots. I'm also running on local boxes at my desk. For the benefit of the group, here are some stats. These are running the same tractography protocol, but on different subjects, so there will be a little variation due to head size and size of parcellated masks, but shouldn't be much. The protocol involves 78 seed masks in network mode with waypoints covering white matter and avoid masks in the ventricles; it outputs fdt_network_matrix but not fdt_paths.

Average times for individual jobs):
SGE cuda queue (Tesla K40m with 10 GB RAM): 1 hr 57 min
SGE general queue: 11 hr 2 min
Local PC 1 (GeForce GTX 1050 with 2 GB RAM): 1 hr 22 min
Local PC 2 (GeForce GTX 750 Ti with 2 GB RAM): 1 hr 44 min

Time to complete ten jobs, divided by ten (parallel):
SGE cuda queue: 1 hr 0 min
SGE general queue: 1 hr 38 min
Local PC 1: 1 hr 22 min (serial only)
Local PC 2: 1 hr 44 min (serial only)

As you can see, the beefy GPU on the grid does best. Perhaps it would do better running in exclusive mode rather than two jobs at once, but it's a shared resource. The general queue uses the CPU version of probtrackx2 and although it is much slower, it catches up because there are many more CPU slots available on the grid than GPU slots. The local machines are running modestly decent consumer cards and perform respectably.

I'm curious about the benchmark for the grid GPU in exclusive mode, so I'll see how amenable our sysadmin is.

Best wishes

Paul




On Mon, 18 Mar 2019 22:32:34 -0700, Moises Hernandez <[log in to unmask]> wrote:

>Hi Paul,
>it is great that you made it work.
>
>On the CPU memory all the required data is allocated from the beginning,
>this depends on your protocol: seeds, ROIs, number samples, etc...
>
>Then,  tool decides how much GPU memory to use dynamically.
>It tries to run in parallel the maximum number of streamlines as possible,
>using 80% of GPU memory available (safety).
>So the amount of mem changes depending on the GPU model.
>
>Is that GPU set in exclusive mode?
>i.e. are other jobs running simultaneously on the same GPU? ... the tool
>achieves great accelerations if it uses a GPU in exclusive mode.
>
>
>
>
>On Mon, 18 Mar 2019 at 06:55, Paul Wright <
>[log in to unmask]> wrote:
>
>> Dear Moises
>>
>> An update on my problem: I have got probtrackx2_gpu to run on the SGE by
>> explicitly selecting the right version of cuda and by increasing the RAM
>> allocated to the job to 32G.
>>
>> A couple of follow-up questions:
>> 1) The GPU RAM use is close to the limit at 9456 / 11441MiB max for our
>> card. Should allowing more system RAM for the job take pressure of the GPU
>> RAM?
>> 2) My job took 40 minutes to complete, vs 20 minutes with the non-gpu
>> version of probtrackx2. Is there something I can change to improve this,
>> since it is expected to run faster?
>>
>> Some info about the job:
>> I am running in network with 78 seed masks, plus waypoints of all white
>> matter and avoid masks of the ventricles. I am not saving fdt_paths images,
>> just the fdt_network_matrix. The DWI data are in 2 mm voxels, with seed
>> masks resampled to DWI space so no transformations applied on-the-fly.
>>
>> It may be that, in this case my job will run faster using CPU than GPU,
>> since we only have a single cuda machine on the grid, but if you can think
>> of anything I can look at that might speed up the GPU job, I'll try it out.
>>
>> Best wishes
>>
>> Paul
>>
>>
>>
>> On Sat, 2 Mar 2019 21:11:23 -0800, Moises Hernandez <[log in to unmask]>
>> wrote:
>>
>> >Hi Paul,
>> >I think the jobs are using CUDA 10.0:
>> >>> Cuda compilation tools, release 10.0, V10.0.130
>> >but the latest released versions of the tool were CUDA 9.2 & CUDA 9.1 (
>> >https://users.fmrib.ox.ac.uk/~moisesf/Probtrackx_GPU/Installation.html)
>> >so what I would try is to install CUDA 9.2 on that machine and make the
>> >jobs to use that version.
>> >You can have different versions of CUDA on the same machine.
>> >
>> >
>> >On Sat, 2 Mar 2019 at 10:02, Paul Wright <
>> >[log in to unmask]> wrote:
>> >
>> >> Hi Moises
>> >>
>> >> Our sysadmin installed the version of probtrackx2_gpu that was
>> appropriate
>> >> for our cuda machine's version. I will check with him that the versions
>> are
>> >> still in sync (ie no cuda update). Assuming versioning is correct, is
>> there
>> >> anything else I can do to diagnose? It's a mysterious error, is there
>> seems
>> >> to be plenty of memory free, and I sent it a job with just two seed
>> masks,
>> >> which shouldn't take up much memory.
>> >>
>> >> Thanks
>> >> Paul
>> >>
>> >>
>> >> On Thu, 28 Feb 2019 12:05:48 -0500, Moises Hernandez <
>> [log in to unmask]>
>> >> wrote:
>> >>
>> >> >Hi Paul,
>> >> >It sounds to me like a problem related to CUDA binary version and the
>> >> >architecture of the GPUs.
>> >> >Are the GPUs different on the SGE machine?
>> >> >If yes, you may need a different CUDA version of probtrackx2_gpu. Maybe
>> >> >that one does not support the GPUs of the SGE machine
>> >> >
>> >> >Moises
>> >> >
>> >> >On Thu, 28 Feb 2019 at 07:30, Paul Wright <
>> >> >[log in to unmask]> wrote:
>> >> >
>> >> >> Dear Moises et al.
>> >> >>
>> >> >> I'm using probtrackx2_gpu to run lots of small tracking jobs. My jobs
>> >> run
>> >> >> fine on my local Ubuntu machine, with cuda etc. set up, and speed up
>> the
>> >> >> process noticably compared with probtrackx2. I want to parallelize
>> the
>> >> >> batch by sending to our Sun Grid Engine, which has a cuda machine
>> >> >> configured, but I'm getting out of memory errors. I allocated up to
>> 16
>> >> GB
>> >> >> to each job, which should be plenty given that my local machine runs
>> >> them
>> >> >> with 16 GB RAM, and the grid machine has 125 GB total. Our admin
>> checked
>> >> >> the logs, and nvidia-smi reports that the job barely used any RAM
>> (copy
>> >> >> below), so we're trying to figure out what is triggering the error on
>> >> the
>> >> >> grid but not on the local machine. (The same job runs OK using the
>> >> regular,
>> >> >> non-gpu version of probtrackx2).
>> >> >>
>> >> >> Please let me know if you can help diagnose the problem. I'm happy to
>> >> >> produce whatever logging you need if you tell me how.
>> >> >>
>> >> >> Best wishes
>> >> >>
>> >> >> Paul Wright
>> >> >>
>> >> >> Command:
>> >> >> /software/system/fsl/fsl-6.0.0/bin/probtrackx2_gpu -s
>> >> >> /data/stcog05.bedpostX/merged -m
>> >> /data/stcog05.bedpostX/nodif_brain_mask -x
>> >> >> /data/stcog05.probtrack/masksSeed.txt -V 2
>> --dir=/data/stcog05.probtrack
>> >> >> --forcedir --network
>> >> --waypoints=/data/stcog05.probtrack/masksWaypoint.txt
>> >> >> --waycond=OR --onewaycondition
>> >> >> --avoid=/data/stcog05.probtrack/masks/ventricles --opd -l
>> >> >>
>> >> >> stdout:
>> >> >> PROBTRACKX2 VERSION GPU
>> >> >> Log directory is: /data/stcog05.probtrackx
>> >> >> Running in network mode
>> >> >> Number of Seeds: 2640
>> >> >> Dimensions Network Matrix: 2 x 2
>> >> >>
>> >> >> Time Loading Data: 22 seconds
>> >> >>
>> >> >>
>> >> >> ...................Allocated GPU 0...................
>> >> >> Free memory at the beginning: 11911102464 ---- Total memory:
>> 11996954624
>> >> >> Free memory after copying masks: 11465326592 ---- Total memory:
>> >> 11996954624
>> >> >> Running 476136 streamlines in parallel using 2 STREAMS
>> >> >> Total number of streamlines: 13200000
>> >> >>
>> >> >> stderr:
>> >> >> CUDA Runtime Error: out of memory
>> >> >>
>> >> >> uname -a
>> >> >> Linux nanlnx16.iop.kcl.ac.uk 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon
>> Nov
>> >> 26
>> >> >> 12:36:06 CST 2018 x86_64 x86_64 x86_64 GNU/Linux
>> >> >>
>> >> >> hostnamectl
>> >> >>    Static hostname: nanlnx16.iop.kcl.ac.uk
>> >> >>          Icon name: computer
>> >> >>         Machine ID: 183fb3179d0349ed8c4bdc57ca5297ff
>> >> >>            Boot ID: 886d6ba0fd054eb9a3efd995f67fa6a3
>> >> >>   Operating System: Scientific Linux 7.6 (Nitrogen)
>> >> >>        CPE OS Name: cpe:/o:scientificlinux:scientificlinux:7.6:GA
>> >> >>             Kernel: Linux 3.10.0-957.1.3.el7.x86_64
>> >> >>       Architecture: x86-64
>> >> >>
>> >> >> modinfo nvidia
>> >> >> filename:
>> >> >>
>> /lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/drivers/video/nvidia.ko
>> >> >> alias:          char-major-195-*
>> >> >> version:        410.79
>> >> >> supported:      external
>> >> >> license:        NVIDIA
>> >> >> retpoline:      Y
>> >> >> rhelversion:    7.6
>> >> >> srcversion:     1283EC37DF82D5A8A902589
>> >> >> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> >> depends:        ipmi_msghandler
>> >> >> vermagic:       3.10.0-957.1.3.el7.x86_64 SMP mod_unload modversions
>> >> >> parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
>> >> >> parm:           NVreg_Mobile:int
>> >> >> parm:           NVreg_ResmanDebugLevel:int
>> >> >> parm:           NVreg_RmLogonRC:int
>> >> >> parm:           NVreg_ModifyDeviceFiles:int
>> >> >> parm:           NVreg_DeviceFileUID:int
>> >> >> parm:           NVreg_DeviceFileGID:int
>> >> >> parm:           NVreg_DeviceFileMode:int
>> >> >> parm:           NVreg_UpdateMemoryTypes:int
>> >> >> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> >> >> parm:           NVreg_UsePageAttributeTable:int
>> >> >> parm:           NVreg_MapRegistersEarly:int
>> >> >> parm:           NVreg_RegisterForACPIEvents:int
>> >> >> parm:           NVreg_CheckPCIConfigSpace:int
>> >> >> parm:           NVreg_EnablePCIeGen3:int
>> >> >> parm:           NVreg_EnableMSI:int
>> >> >> parm:           NVreg_TCEBypassMode:int
>> >> >> parm:           NVreg_UseThreadedInterrupts:int
>> >> >> parm:           NVreg_EnableStreamMemOPs:int
>> >> >> parm:           NVreg_EnableBacklightHandler:int
>> >> >> parm:           NVreg_EnableUserNUMAManagement:int
>> >> >> parm:           NVreg_MemoryPoolSize:int
>> >> >> parm:           NVreg_KMallocHeapMaxSize:int
>> >> >> parm:           NVreg_VMallocHeapMaxSize:int
>> >> >> parm:           NVreg_IgnoreMMIOCheck:int
>> >> >> parm:           NVreg_RegistryDwords:charp
>> >> >> parm:           NVreg_RegistryDwordsPerDevice:charp
>> >> >> parm:           NVreg_RmMsg:charp
>> >> >> parm:           NVreg_GpuBlacklist:charp
>> >> >> parm:           NVreg_AssignGpus:charp
>> >> >>
>> >> >> nvcc --version
>> >> >> nvcc: NVIDIA (R) Cuda compiler driver
>> >> >> Copyright (c) 2005-2018 NVIDIA Corporation
>> >> >> Built on Sat_Aug_25_21:08:01_CDT_2018
>> >> >> Cuda compilation tools, release 10.0, V10.0.130
>> >> >>
>> >> >> qacct -u k1347787 -j \* -b 201902221200 -q cuda
>> >> >> ==============================================================
>> >> >> qname        cuda
>> >> >> hostname     nanlnx16.iop.kcl.ac.uk
>> >> >> group        image
>> >> >> owner        k1347787
>> >> >> project      NONE
>> >> >> department   defaultdepartment
>> >> >> jobname      fscon3vprobtrackx_gpu.job
>> >> >> jobnumber    4422736
>> >> >> taskid       1
>> >> >> account      sge
>> >> >> priority     0
>> >> >> qsub_time    Fri Feb 22 13:17:10 2019
>> >> >> start_time   Fri Feb 22 13:17:16 2019
>> >> >> end_time     Fri Feb 22 13:17:50 2019
>> >> >> granted_pe   NONE
>> >> >> slots        1
>> >> >> failed       0
>> >> >> exit_status  0
>> >> >> ru_wallclock 34s
>> >> >> ru_utime     23.006s
>> >> >> ru_stime     5.679s
>> >> >> ru_maxrss    5.473MB
>> >> >> ru_ixrss     0.000B
>> >> >> ru_ismrss    0.000B
>> >> >> ru_idrss     0.000B
>> >> >> ru_isrss     0.000B
>> >> >> ru_minflt    1541199
>> >> >> ru_majflt    103
>> >> >> ru_nswap     0
>> >> >> ru_inblock   665408
>> >> >> ru_oublock   19016
>> >> >> ru_msgsnd    0
>> >> >> ru_msgrcv    0
>> >> >> ru_nsignals  0
>> >> >> ru_nvcsw     13074
>> >> >> ru_nivcsw    1725
>> >> >> cpu          28.685s
>> >> >> mem          26.492GBs
>> >> >> io           332.312MB
>> >> >> iow          0.000s
>> >> >> maxvmem      4.477GB
>> >> >> arid         undefined
>> >> >> ar_sub_time  undefined
>> >> >> category     -u k1347787 -q cuda -l h_vmem=16G
>> >> >>
>> >> >>
>> ########################################################################
>> >> >>
>> >> >> To unsubscribe from the FSL list, click the following link:
>> >> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >> >>
>> >> >
>> >>
>> >########################################################################
>> >> >
>> >> >To unsubscribe from the FSL list, click the following link:
>> >> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >> >
>> >>
>> >> ########################################################################
>> >>
>> >> To unsubscribe from the FSL list, click the following link:
>> >> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >>
>> >
>> >########################################################################
>> >
>> >To unsubscribe from the FSL list, click the following link:
>> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>> >
>>
>> ########################################################################
>>
>> To unsubscribe from the FSL list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>>
>
>########################################################################
>
>To unsubscribe from the FSL list, click the following link:
>https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
>

########################################################################

To unsubscribe from the FSL list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=FSL&A=1
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options