hi tim,
here is the standard out from your request:
node004 node004 node002 node002 node003 node003 node001
/tmp/61674.1.all.q/rsh
/mnt/sge/bin/lx26-x86/qrsh -inherit node004 hostname
node004
/mnt/sge/bin/lx26-x86/qrsh -inherit node004 hostname
node004
/mnt/sge/bin/lx26-x86/qrsh -inherit node002 hostname
node002
/mnt/sge/bin/lx26-x86/qrsh -inherit node002 hostname
node002
/mnt/sge/bin/lx26-x86/qrsh -inherit node003 hostname
node003
/mnt/sge/bin/lx26-x86/qrsh -inherit node003 hostname
node003
/mnt/sge/bin/lx26-x86/qrsh -inherit node001 hostname
node001
here is the p out file:
-catch_rsh
/var/spool/sge/default/node004/active_jobs/61674.1/pe_hostfile
node004
node004
node002
node002
node003
node003
node001
no errors on this run using 7 cpus
here is what a typical qstat -u looks like when bedpost is running:
node001 lx26-x86 2 1.89 4.0G 848.7M 3.8G
0.0
job-ID prior name user state submit/start at
queue master ja-task-ID
------------------------------------------------------------------------
----------------------
61183 10.35000 DTILD_3139 mrjeffs r 07/13/2005 20:06:02
all.q@node SLAVE
node002 lx26-x86 2 1.89 4.0G 77.3M 3.8G
0.0
61183 10.35000 DTILD_3139 mrjeffs r 07/13/2005 20:06:02
all.q@node SLAVE
all.q@node SLAVE
node003 lx26-x86 2 1.86 4.0G 73.1M 3.8G
0.0
61183 10.35000 DTILD_3139 mrjeffs r 07/13/2005 20:06:02
all.q@node SLAVE
all.q@node SLAVE
node004 lx26-x86 2 1.82 4.0G 85.4M 3.8G
0.0
61183 10.35000 DTILD_3139 mrjeffs r 07/13/2005 20:06:02
all.q@node MASTER
all.q@node SLAVE
all.q@node SLAVE
so i have engaged 7 processors in the bedpost run at 90% capacity. the
process is not
completing normally after 2hrs. the source dir looks like this:
-rwxrwxrwx 1 mrjeffs mrjeffs 497 Jun 29 10:32 bvals
-rwxrwxrwx 1 mrjeffs mrjeffs 509 Jun 29 10:32 bvalues.txt
-rwxrwxrwx 1 mrjeffs mrjeffs 1527 Jun 29 10:32 bvec.txt
-rwxrwxrwx 1 mrjeffs mrjeffs 1488 Jun 29 10:32 bvecs
-rwxrwxrwx 1 mrjeffs mrjeffs 6457 Jul 5 13:40 data.ecclog
-rwxrwxrwx 1 mrjeffs mrjeffs 20124376 Jul 5 13:40 data.nii.gz
-rwxrwxrwx 1 mrjeffs mrjeffs 348 Jun 29 10:32 fM_0001.hdr
-rwxrwxrwx 1 mrjeffs mrjeffs 69206016 Jun 29 10:33 fM_0001.img
-rwxrwxrwx 1 mrjeffs mrjeffs 764866 Jul 5 13:40 nodif.nii.gz
-rwxrwxrwx 1 mrjeffs mrjeffs 14720 Jul 5 13:40
nodif_brain_mask.nii.gz
-rwxrwxrwx 1 mrjeffs mrjeffs 312766 Jul 5 13:40
nodif_brain_mask1.nii.gz
fM_0001.img is the uneddy corr and data is the eddy corr off the B0
the .bedpost dir contains the following:
-rwxrwxr-x 1 mrjeffs mrjeffs 497 Jul 13 20:06 bvals
-rwxrwxr-x 1 mrjeffs mrjeffs 1488 Jul 13 20:06 bvecs
-rw-rw-r-- 1 mrjeffs mrjeffs 2571815 Jul 13 22:14
dyadic_vectors.nii.gz
drwxrwxr-x 3 mrjeffs mrjeffs 8192 Jul 13 22:11 logs
-rw-rw-r-- 1 mrjeffs mrjeffs 838347 Jul 13 22:13 mean_fsamples.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 821363 Jul 13 22:13
mean_phsamples.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 817071 Jul 13 22:13
mean_thsamples.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 42151287 Jul 13 22:13
merged_fsamples.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 41626122 Jul 13 22:12
merged_phsamples.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 41789914 Jul 13 22:12
merged_thsamples.nii.gz
-rwxrwxr-x 1 mrjeffs mrjeffs 764866 Jul 13 20:06 nodif.nii.gz
-rw-rw-r-- 1 mrjeffs mrjeffs 312765 Jul 13 20:06 nodif_brain.nii.gz
-rwxrwxr-x 1 mrjeffs mrjeffs 14720 Jul 13 20:06
nodif_brain_mask.nii.gz
drwxrwxr-x 2 mrjeffs mrjeffs 20 Jul 13 22:14 xfms
but no V1, 2, 3 or other final files. it seems it gets into the process
and then is unable to
reassemble data. the error is:
writing slice 62
writing slice 63
/mnt/local/fsl/bin/bedpost: line 293: syntax error: unexpected end of
file
63 is the last slice
293 is the last line of bedpost so it breaks out of a loop and goes to
the end without finishing it seems.
what is going wrong?
.
cheers jeff
ps more on the feat testing in a little while
On Jul 14, 2005, at 1:44 AM, Tim Behrens wrote:
> Hi - can you do the following for me
>
> echo $FSLMACHINELIST
>
> echo $FSLREMOTECALL
>
> for i in $FSLMACHINELIST; do $FSLREMOTECALL $i hostname;done
>
> and tell me the reslut of each
>
> Thanks
>
> T
>
>
>
> On Tue, 12 Jul 2005, Jeff Stevenson wrote:
>
>> hi all, for those of you more experienced in cluster computing i am
>> having trouble testing fsl on a cluster here at uw.
>> we have a sungrid 60 cpu system with alex korb's feat multithreaded
>> tcl
>> plugin (thanks alex for permission to test) installed.
>> i wanted to test bedposting and feat to see speed gains and fsl is not
>> able to see the addl cpus. the FSLMACHINELIST
>> and FSLREMOTECALL are not being set up correctly. below is the way the
>> cluster admin recommended setting it up
>> but fsl is not happy. i would be happy to go into the gory details off
>> list since this is side issue for most. how do i get
>> fsl to see the cpus on a dynamically allocated system.
>>
>> cheers jeff
>>
>> feat script:
>>
>> #$ -S /bin/sh
>> #$ -cwd
>> #$ -l virtual_total=5.0G
>> #$ -hard -l mem_total=3.0G
>> #$ -m bea
>> #$ -V
>> #$ -pe mpich 2-6
>>
>> FSLMACHINELIST=`cat ${TMPDIR}/machines | xargs echo`
>> FSLREMOTECALL=${TMPDIR}/rsh
>> export FSLMACHINELIST FSLREMOTECALL
>>
>> /mnt/local/fsl/bin/feat_model design
>>
>> and for dti:
>>
>> #$ -S /bin/sh
>> #$ -cwd
>> #$ -l virtual_total=5.0G
>> #$ -m bea
>> #$ -V
>> #$ -hard -l mem_total=3.0G
>> eddy_correct fM_0001 data 1
>> avwroi data nodif 0 1
>> bet nodif nodif_brain_mask1
>> avwmaths nodif_brain_mask1 -bin nodif_brain_mask
>> cat << END | qsub -pe mpich 2-10
>> #$ -cwd
>> #$ -l virtual_total=5.0G
>> #$ -m bea
>> #$ -V
>> FSLMACHINELIST=`cat ${TMPDIR}/machines | xargs echo`
>> FSLREMOTECALL=${TMPDIR}/rsh
>> export FSLMACHINELIST FSLREMOTECALL
>> bedpost ~/DTI/DTILD_3139
>>
>
> --
> -----------------------------------------------------------------------
> --------
> Tim Behrens
> Centre for Functional MRI of the Brain
> The John Radcliffe Hospital
> Headley Way Oxford OX3 9DU
> Oxford University
> Work 01865 222782
> Mobile 07980 884537
> -----------------------------------------------------------------------
> --------
>
|