Dear FSL-users,
When I run complex FEATs on our SGE cluster (especially when other people are running analyses in parallel, too), the FEAT analyses sometimes crash unexpectedly at the end of the stats calculation / at the beginning of the post-stats calculation. This seems to be caused by memory problems of the cluster: many jobs together needing to much memory. Rerunning the crashed analyses separately always produce good results in the end. Any idea how to avoid this problem?
One solution may be to reserve a certain amount of memory for the particular job submitted. Is there a way to get an estimate of the amount of memory needed for the particular FEAT analysis used? Can this memory constraint then be used to reserve resources for jobs that are submitted to the SGE with fsl_sub (this is possible in qsub with the -mem parameter)?
Any solution provided will help me to avoid the time-consuming rerunning of FEATs over and over again until the results are ok.
Thanks for your help!
Kind regards,
Henk van Steenbergen
Leiden Institute for Brain and Cognition
==========================
Information about the jobs submitted:
40 FEATs, each with ~ 30 EVs (+ 30 temporal derrivative EVs) on 532 volumes on whole-brain fMRI data (size nii file 250 MB)
==========================
Information about the SGE cluster used:
Hardware:
Head: CPU: Quad core Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Memory: 4GB (+10GB swap)
Disks: OS: mirrored, 80 GB
Data: RAID50, 6TB
4x Nodes: CPU: 2x Quad core Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Memory: 16GB (+16GB swap)
Disks: OS: mirrored, 80 GB
Software: OS: Head installed with Rocks Cluster (Rocks release 5.2 (Chimichanga))
FSL: 4.1.4 (copied the files into directory)
SGE: 5.2.0 (as packaged with Rocks Cluster)
|