Print

Print


Hello

I come back to this old thread, because of the same need : to run fsl 
(and bedpostx) on a slurm cluster.
Any version of fsl_sub that I could start with ?

Many thanks


Romain



Le 03/04/2013 16:35, Juha Pajula a écrit :
> Hi!
>
> I have started to modify the fsl_sub for Slurm, but now it seems that I
> need to modify also other instances due to different output of Slurm
> functions.
>
> Where in FSL the associations between sge tasks are processed. If I have
> understood right the last line of earlier command ( at least with FEAT
> logs) is treated as an ID of the process and this ID is then passed to
> the next process to wait before start. The original FSL code returns now
> the first word of the line which is not a number as Slurm has different
> output.
>
> In Slurm the last line of output after srun/sbatch (corresponds to qsub)
> looks like this on command line:
>
> srun: job 211205 has been allocated resources
>
> Which generates output like this in Slurm environment (from FEAT logs):
>
> /home/pajula2/FSL_test/fsl/bin/fsl_sub -T 10 -l logs -N feat0_init   /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -init
> SLURM
> Starting Slurm submissions...
> sge_command: srun --chdir=/home/pajula2/tests/test1+++++++++++++++++.feat -p sgn -J feat0_init --output=logs/log.o --error=logs/log.e -t 01:00:00 -N 1
> executing: /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -init
> srun: job 211193 queued and waiting for resources
> srun: job 211193 has been allocated resources
>
> /home/pajula2/FSL_test/fsl/bin/fsl_sub -T 30 -l logs -N feat1b_reg -j srun:  /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -reg
> SLURM
> Starting Slurm submissions...
> sge_command: srun --chdir=/home/pajula2/tests/test1+++++++++++++++++.feat -p sgn -J feat1b_reg --output=logs/log.o --error=logs/log.e --dependency=afterok:srun: -t 01:00:00 -N 1
> executing: /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -reg
> srun: error: Unable to allocate resources: Job dependency problem
>
> /home/pajula2/FSL_test/fsl/bin/fsl_sub -T 8 -l logs -N feat2_pre -j srun::srun:  /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -prestats
> SLURM
> Starting Slurm submissions...
> sge_command: srun --chdir=/home/pajula2/tests/test1+++++++++++++++++.feat -p sgn -J feat2_pre --output=logs/log.o --error=logs/log.e --dependency=afterok:srun::srun: -t 01:00:00 -N 1
> executing: /home/pajula2/FSL_test/fsl/bin/feat /home/pajula2/tests/test1+++++++++++++++++.feat/design.fsf -D /home/pajula2/tests/test1+++++++++++++++++.feat -I 1 -prestats
> srun: error: Unable to allocate resources: Job dependency problem
>
>
> Where I can found this ID processing phase?
>
> I found out already where the dot is set between different IDs (here
> replaced by colon), but I am missing the source of IDs.  Are the ID's
> collected originally from the output of qsub command or from some variable?
>
>
> --
>
> Juha Pajula,
> Researcher, Ph.D. Student,
> Methods and Models for Biological Signals and Images group of Signal
> Processing department in Tampere University of Technology,
> Finland
>
> On 03/23/13 09:53, Mark Jenkinson wrote:
>> Hi,
>>
>> We don't have any experience using Slurm in Oxford but maybe someone else on the list does.
>> As for CUDA, the current release doesn't support this but we do have CUDA code running in-house at the moment and are intending to release this quite soon.
>>
>> All the best,
>> 	Mark
>>
>>
>> On 22 Mar 2013, at 07:52, Juha Pajula <[log in to unmask]> wrote:
>>
>>> Hi!
>>>
>>> Our university set up a new computing cluster recently and in this new
>>> system the parallel resource management is based on Slurm
>>> (https://computing.llnl.gov/linux/slurm/)
>>>
>>> I am currently setting up the FSL to the cluster and it seems to work
>>> now fine in a single node. For the real analysis work I need the
>>> parallel abilities of FLS and for this reason I have to modify the
>>> fsl_sub for the slurm environment.
>>>
>>> Do you have any experience how to modify the fsl_sub for Slurm? I didn't
>>> found any notes about Slurm from FSL webpage.
>>>
>>>
>>> As a minor question: Does FSL support CUDA computations?
>>>
>>>
>>>
>>> -- 
>>> Juha Pajula,
>>> Researcher, Ph.D. Student,
>>> Methods and Models for Biological Signals and Images group of Signal
>>> Processing department in Tampere University of Technology,
>>> Finland