Hi,
I'd like to propose a patch for FSL's SGE interface that would allow for
an easier integration of FSL into existing cluster environments without
having to edit the source.
The attached patch series is split into three parts that build on top of
each other. Here is a short summary:
0001*
Removed any reference to the SGE_ROOT variable outside of 'fsl_sub'.
There are SGE installations that will not have this available, but are
nevertheless fully functional (e.g. the Debian gridengine package
works like this). Having all cluster-specific logic in 'fsl_sub" would
also allow to develop alternative interfaces to e.g. torque.
Instead of SGE_ROOT the patch makes FSL consistently use the
FSLPARALLEL variable to determine whether it should employ SGE. That
was already implemented like this in most TCL code. This patch simply
removes/modifies the remaining references.
Finally, this first patch also moves all email delivery logic into
fsl_sub -- and change the default domain from fmri.ac.ox.uk to
localhost ;-)
The last bit is that all interesting configuration items in fsl_sub
itself are extracted into variables and set in the header of the
script.
0002*
The second patch changes the behavior of fsl_sub. fsl_sub did select
a queue to submit jobs based on the time estimate passed with the -T
option. To make this work fsl_sub had to be edited to learn about the
available cluster queues in a particular environment and their
respective properties. However, SGE is perfectly capable of doing this
on its own. Therefore I modified fsl_sub to pass the time estimate as
'h_rt' complex setting directly to SGE itself. Based on that
information it will autoselect an appropriate queue.
The patch does not affect the ability to specify a queue with the -q
option of fsl_sub.
I also removed/adapted the documentation in fsl_sub regarding the
queue setup.
The big advantage of this setup is that the full complexity of SGE
queue setups is exposed to FSL. Jobs are scheduled according to
resource demands and load in specific queues and are not necessarily
bound to a specific, hard-coded queue set.
0003*
The last is a tiny patch that introduces an environment variable
FSLCLUSTER_MAILOPTS that allows to override the mailing options in
fsl_sub -- this determines when SGE should send email (e.g. starting,
ending, suspending, aborting jobs). By default no email is sent --
which also was the former default. The variable simply exposes this to
allow for quick job/user-specific changes without having to edit
fsl_sub.
Ideally, these patches or a subset would make it into the FSL code base,
but even if not, I thought they might be useful for people maintaining
FSL cluster installations.
My initial tests suggest that the modified FSL works fine. I will
do further checks and would appreciate any feedback if there are
problems with it.
If nothing speaks against it, these modifications would become part of
the next Debian package version that should significantly facilitate
FSL+SGE installations -- even on single machines, but with multiple
cores.
Thanks for your consideration,
Michael
--
GPG key: 1024D/3144BE0F Michael Hanke
http://mih.voxindeserto.de
|