Print

Print


Thanks, Mark!

It seems submited the job of possumX_postproc 11 hours before (with 40 nodes):

/usr/share/fsl/4.1/bin/fsl_sub -T 10 -j 780 -F -l /home/work/simdir/logs  /usr/share/fsl/4.1/bin/possumX_postproc.sh /home/work/simdir 40
Done
Sun Mar 11 00:40:17 EST 2012

and 11 hours later (with 40 nodes),  run possum_sum and get the errors.
/usr/share/fsl/4.1/bin/possum_sum -i /home/work/simdir/diff_proc/signal_proc_ -o /home/work/simdir/signal -n 40 
Could not open matrix file /home/work/simdir/diff_proc/signal_proc_0
Could not open matrix file /home/work/simdir/diff_proc/signal_proc_1
... 
Mon Mar 12 11:00:58 EDT 2012

I don't know why the possum_sum (first sentence of possumX_postproc) is run 11 hours later. And is this the problem of first processes failed? Is that previous process possum.com? 

Thanks,

Chao-Gan

On Wed, Apr 11, 2012 at 3:52 AM, Mark Jenkinson <[log in to unmask]> wrote:
Dear Chao-Gan,

It looks like the later processes are failing to find the outputs from the
previous steps.  This could be because the first processes failed or
because the later processes were not waiting correctly (we implement
job holds in SGE for this).  If the errors are occurring very quickly then
I would suspect the latter, as the generation of the signal_proc_* files
is the slow part of the simulation.

I hope this helps.
All the best,
Mark


On 10 Apr 2012, at 04:22, YAN Chao-Gan wrote:

Hi, Mark and POSSUM experts,

Recently I run possum on a cluster, but always stopped because a problem of "Could not open matrix file /home/work/simdir/diff_proc/signal_proc_*".

Is there something wrong with the cluster configuration? Thanks!

Information in possum.log
...
subjectdir is /home/work/simdir
Making possum directory structure
Processing stage
Sun Mar 11 00:40:16 EST 2012
/usr/share/fsl/4.1/bin/fsl_sub -T 2000 -l /home/work/simdir/logs  -N possum -t /home/work/simdir/possum.com
Post processing stage
Sun Mar 11 00:40:17 EST 2012
/usr/share/fsl/4.1/bin/fsl_sub -T 10 -j 780 -F -l /home/work/simdir/logs  /usr/share/fsl/4.1/bin/possumX_postproc.sh /home/work/simdir 40
Done
Sun Mar 11 00:40:17 EST 2012
/usr/share/fsl/4.1/bin/possum_sum -i /home/work/simdir/diff_proc/signal_proc_ -o /home/work/simdir/signal -n 40 
Could not open matrix file /home/work/simdir/diff_proc/signal_proc_0
Could not open matrix file /home/work/simdir/diff_proc/signal_proc_1
Could not open matrix file /home/work/simdir/diff_proc/signal_proc_2
...

Best,

Chao-Gan