FSL users,
I've set up a Linux cluster (CentOS 5, x86_64) with SGE (version 6.1u5) and
installed FSL (version fsl-4.1.0-centos5_64 with patch
fsl-centos5_64-patch-4.1.1_from_4.1.0). I am able to launch simple test
jobs on the SGE queue (configured as a single queue "all.q"). I have edited
the file /usr/local/fsl/bin/fsl_sub to use this single queue as I had
previously done on a Rocks 4.3 cluster where we were running parallel
FSL/bedpostx jobs. And I have added the file fsl.sh to /etc/profile.d on
the head node and all compute nodes which contains:
FSLDIR=/usr/local/fsl
. ${FSLDIR}/etc/fslconf/fsl.sh
PATH=${FSLDIR}/bin:${PATH}
export FSLDIR PATH
since all users are using the bash shell. (/usr/local is NFS mounted from
the head node across all of the compute nodes.)
Now, I have a user who is trying to run FSL/feat jobs as the first user of
this cluster, and the jobs all seem to get submitted and then die without
any output. However, I am seeing this in the SGE logs for every job:
10/20/2008 17:20:11|qmaster|hydra|W|job 112.1 failed on host
node02.bic.ucsb.edu general searching requested shell because: 10/20/2008
17:20:10 [506:7430]: execvp(feat5_stop, "feat5_stop" "-m" "n" "-o" "logs"
"-e" "logs" "-hold_jid" "107,108,111,110" "/usr/local/fsl/bin/feat"
"/home/nwymbs/Chunk_fMRI/subjects/sg_004/test++.feat/design.fsf" "-D"
"/home/nwymbs/Chunk_fMRI/subjects/sg_004/test++.feat" "-stop") failed: No
such file or directory
10/20/2008 17:20:11|qmaster|hydra|W|rescheduling job 112.1
(I see similar errors for feat5_reg, feat4_post, etc.)
And this leaves me puzzled. Where is this strange call to "execvp" coming
from? Why is the first argument "feat5_stop" which doesn't exist as a file
anywhere on the system? I see references to feat5_stop in
/usr/local/fsl/bin/feat, but the first argument to execvp is supposed to be
a loadable program file (from my understanding after reading the man page
for execvp). Is FSL somehow not configured correctly to talk to SGE? Or do
I have to configure SGE somehow?
Hopefully I have just missed something simple, but I'm rather stumped right
now. Any suggestions for figuring this out would be greatly appreciated.
Thanks for any ideas! - John
|