Hi - I'm afraid this is a general issue with SGE - it's not very good at being clever wrt total usage (per node) of resources like CPUs and RAM.   I suggest you talk to your sysadmin about this, and maybe reduce the number of slots per node, or at least increase the amount of swap.
Cheers.



On 22 Jul 2010, at 09:20, H van Steenbergen wrote:

Dear FSL-users,

When I run complex FEATs on our SGE cluster (especially when other people are running analyses in parallel, too), the FEAT analyses sometimes crash unexpectedly at the end of the stats calculation / at the beginning of the post-stats calculation. This seems to be caused by memory problems of the cluster: many jobs together needing to much memory. Rerunning the crashed analyses separately always produce good results in the end. Any idea how to avoid this problem?

One solution may be to reserve a certain amount of memory for the particular job submitted. Is there a way to get an estimate of the amount of memory needed for the particular FEAT analysis used? Can this memory constraint then be used to reserve resources for jobs that are submitted to the SGE with fsl_sub (this is possible in qsub with the -mem parameter)?

Any solution provided will help me to avoid the time-consuming rerunning of FEATs over and over again until the results are ok.

Thanks for your help!

Kind regards,

Henk van Steenbergen
Leiden Institute for Brain and Cognition

==========================
Information about the jobs submitted:

40 FEATs, each with ~ 30 EVs (+ 30 temporal derrivative EVs) on 532 volumes on whole-brain fMRI data (size nii file 250 MB)

==========================
Information about the SGE cluster used:

Hardware:
Head: CPU: Quad core Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Memory: 4GB (+10GB swap)
Disks: OS: mirrored, 80 GB
Data: RAID50, 6TB

4x Nodes: CPU: 2x Quad core Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Memory: 16GB (+16GB swap)
Disks: OS: mirrored, 80 GB

Software: OS: Head installed with Rocks Cluster (Rocks release 5.2 (Chimichanga))
FSL: 4.1.4 (copied the files into directory)
SGE: 5.2.0 (as packaged with Rocks Cluster)



---------------------------------------------------------------------------
Stephen M. Smith, Professor of Biomedical Engineering
Associate Director,  Oxford University FMRIB Centre

FMRIB, JR Hospital, Headington, Oxford  OX3 9DU, UK
+44 (0) 1865 222726  (fax 222717)
[log in to unmask]    http://www.fmrib.ox.ac.uk/~steve
---------------------------------------------------------------------------