Hi Neil,
I parallelised bedpost on an altix simply by using sh instead of the
REMOTECALL command. I altered the bedpost script to pass on its
bedpost_proc to /bin/sh instead of $REMOTECALL $machine. You would still
have to create a dummy FSLMACHINELIST, so that the script would know how
many processes to start in parallel. There were one or two more things
to consider, but unfortunately I don't remember all of them.
The other thing you may want to think about is using a proper queueing
system instead of this rather crude approach, but if you have the
machine to yourself, it's probably fine.
Cheers,
Johannes
Neil Killeen schrieb:
> Hi
>
> I am looking into parallelizing bedpost on our Altix. The Altix
> uses a single Linux image to manage all of the processors
> so all processes will be local; no need for ssh/rsh invocations
>
> I have been looking at the bedpost script to see how I should modify it
> for the Altix
>
>
> The relevant bit of code is :
>
> if [ "x$FSLMACHINELIST" = "x" ] ; then
> echo "processing data on local host"
> ${FSLDIR}/bin/bedpost_proc $subjdir $nslices
> ${subjdir}.bedpost/logs/pid_${$} &
> else
> echo "processing data on hosts: $FSLMACHINELIST"
> for machine in $FSLMACHINELIST; do
> echo "if [ -r /usr/local/etc/fslconf/fsl.sh ];then .
> /usr/local/etc/fslconf/fsl.sh;fi;
> if [ -r /etc/fslconf/fsl.sh ];then . /etc/fslconf/
> fsl.sh;fi;
> if [ -r \${HOME}/.fslconf/fsl.sh ]; then .
> \${HOME}/.fslconf/fsl.sh; fi;
> if [ x\${FSLDIR} != "x" ];then
> \${FSLDIR}/bin/bedpost_proc $subjdir $nslices
> ${subjdir}.bedpost/logs/pid_${$}; else echo FSLDIR not set in any
> default location on machine `hostname`;fi"| $FSLREMOTECALL $machine
> /bin/sh &
> done
> fi
>
>
> Since all processes will run locally, I can scrap all the environment
> testing and the remote call.
>
> My understanding of the parallelization process is that each voxel is
> done separately.
> However, I don't follow how you are managing that.
> This is because the command you pass to the remote machine (see above)
> is the same as
> that used for running the whole thing on one host. Viz:
>
> ${FSLDIR}/bin/bedpost_proc $subjdir $nslices
> ${subjdir}.bedpost/logs/pid_${$}
>
>
> This command is passed to each machine in your list, or in my case, I
> would just issue
> it N_proc times.
>
> How is it decided which process operates upon which voxel of the image
> when you are operating in parallel ? Additionally, if there are say
> N_v voxels, and I operate with say N_p < N_v processors, how is that
> 'chunking' up of the problem
> handled (i.e. how do you generate the list of voxels for each processor
> to operate on
> since there are less processors than voxels). Is there some kind of
> log file that
> indictaes which voxels are being processed and each running process
> keeps taking the
> next available voxel until they are all done ?
>
>
> thanks
> Neil
>
|