Hi Lokke,
Thanks for the response. We have a very similar setup, and as usual all users authenticate via LDAP/Kerberos to the XServe for access to their accounts, shares and Kerberized services. This works for Windows and Linux clients as well as Mac. The problem is that SGE seems to explicitly state that the nodes in the grid must have identical _local_ accounts. This makes a certain kind of sense, it's the other way to solve the issue of permissions besides using the 'nobody' user, but it doesn't seem like it scales well.
Imagine trying to administer a grid at a university or company across buildings, networks, os types and versions, and ensuring that every system has the same user accounts with the same IDs per user - it's functionally impossible. The other method using 'nobody' is used by XGrid, which allows for massively distributed grid processing, a la SETI@Home and Stanford's XGrid projects. Any user anywhere in the world can elect to become a grid node without any special user accounts on their system.
It makes me think that I've perhaps misunderstood SGE's requirements, so I was hoping to get input from someone who has it fully implemented and functioning. Steve?
Thanks again!
Jeremy
----- Original Message ----
From: Lokke Highstein <[log in to unmask]>
To: [log in to unmask]
Sent: Tuesday, October 9, 2007 4:31:50 PM
Subject: Re: [FSL] FSL on XGrid w/ XServe (now SGE)
hi jeremy,
i'm very new to this but i can already answer one of your questions
and then defer to the others here on the second two.
we have a 8 node xserve cluster and we use open directory as our
master username/password database management system. just about
anything that can access LDAP data can authenticate to it so most of
the computers (servers included) can share the same username/password
database (although the windows xp boxes give me trouble.)
open directory is built into OSX server, and on the xserve nodes it's
very simple to set up so they authenticate to that system.
i'm not sure about permissions, i am still playing with them at the
moment, and i also am just starting to look into multiple queues.
i'm sure someone else will have more info.
-lokke
On Oct 9, 2007, at 3:17 PM, Jeremy Bronson wrote:
> Hi,
>
> I tried to get FSL working with XGrid during the summer, but wasn't
> quite successful, and was recommended to use SGE in its place by
> Steve and
> others. I'm giving it another go, but I've run into a few questions
> while setting up SGE. Hopefully they're easy enough to answer...
>
> -The SGE manual says that each host must have the same user account
> names and passwords, which is the alternate to XGrid's nobody:nobody
> permissions, but seems equally impractical. If SGE is really
> designed to be
> implemented in large grids, I'm not sure how it could scale beyond a
> handful of managed machines. My question is whether this is how
the
> lab is configured at Oxford or any other site, or alternately are
all
> jobs submitted under a single username?
>
> -All scan data resides on an XServe, automounted via NFS. What kind
of
> permissions are necessary on this share?
>
> -Is anyone using the multiple queues available in SGE, or only using
> one for large jobs?
>
> My basic goal is to allow users to login, start multiple large jobs
on
> the grid, then log out and retrieve the results later. The FEAT
> first-level analyses tend to tie up all the machines for a time,
> so I'd love
> to speed them up by running the jobs in parallel and in the
> background.
>
>
> Thanks in advance!
>
> Jeremy
>
>
>
>
>
>
> On Tue, 14 Aug 2007 06:24:00 +0100, Steve Smith
<[log in to unmask]>
> wrote:
>> Hi Jeremy,
>>
>> I agree with Andrew, most people seem much happier with SGE than
with
>
>> XGrid, so if it's not too late I would consider that.
>>
>> Anyway - yes, hopefully FSL 4.0 should be fairly easy to setup with
>> either (though much easier with SGE as that's what we have so should
>
>> need much less customising).
>>
>> First, see the brief intro at:
>> http://www.fmrib.ox.ac.uk/fsl/fsl/downloading.html#sge
>> So if the user runs any of these programs on a machine which can
>> submit to the cluster then that should happen automatically, after
>> the sysadmin has:
>>
>> Setup the central cluster-controlling script
>> $FSLDIR/bin/fsl_sub
>>
>> This is a heavily commented shell script and hopefully should be
>> reasonably easy to follow and customise.....
>>
>>
>> So hopefully the user can just run FSL programs on their normal
>> working machine, and if it's setup to be a cluster submission host,
>> then whenever a "big" job is run that will automatically get sent to
>
>> the queue.
>>
>> Cheers, Steve.
>>
>>
>>
>>
>>
>> On 14 Aug 2007, at 05:59, Jeremy Bronson wrote:
>>
>>> Hi All,
>>>
>>> I'm the new administrator of a neuroimaging research lab, and I'm
>>> working on getting FSL and other MRI-analysis tools running on
>>> XGrid. I'm not yet intimately familiar with how FSL works, so I'm
>
>>> hoping there are others out there who have already figured out how
>
>>> to run it on a cluster, so I don't have to reinvent the wheel.
>>> I've heard of several existing clusters that run FSL, albeit a
>>> modified version. I'm hoping it can be done with the off-the-shelf
>
>>> version, perhaps it's now possible with the just-released 4.0? If
>
>>> anybody could point me in the direction of some specifics, I'd be
>>> most grateful.
>>>
>>> We've got an XServe with RAID that houses all the data, and FSL is
>
>>> installed and configured on all host (agent) machines. Most users
>
>>> use the GUI, and manually point the tools to the appropriate data,
>
>>> so I assume they'll need to familiarize themselves with the
> command-
>>> line tools and specifying data directories on the CLI (or via
>>> GridStuffer). I'm thinking that each machine will need the data
>>> volume to be auto-mounted (NFS?) at startup with appropriate
>>> directories having read/write access for the 'nobody' user. Does
>>> this sound correct?
>>>
>>> Additionally, each tool that's part of the FSL package seems to
>>> launch a number of other UNIX commands during analysis, to copy,
>>> move and otherwise manipulate the result data. Will this confuse
>>> XGrid, or will the job and all sub-commands run until the original
>
>>> command completes? (e.g. A complete FEAT analysis)
>>>
>>> Hopefully this isn't too difficult, and afterwards I'd like to take
>
>>> the time to post a HOWTO on macresearch.org or the like, so others
>
>>> might take advantage of the info. Thanks in advance to anyone who
>
>>> might be able to help.
>>>
>>>
>>> Jeremy Bronson
>>> Systems Administrator
>>> Frey Research Lab
>>> University of Oregon
>>
>>
>>
>
>
----------------------------------------------------------------------
> --
>> ---
>> Stephen M. Smith, Professor of Biomedical Engineering
>> Associate Director, Oxford University FMRIB Centre
>>
>> FMRIB, JR Hospital, Headington, Oxford OX3 9DU, UK
>> +44 (0) 1865 222726 (fax 222717)
>> [log in to unmask] http://www.fmrib.ox.ac.uk/~steve
>>
>
>
----------------------------------------------------------------------
> --
>> ---
>>
>
>
>
>
>
> Hi,
>
> I tried to get FSL working with XGrid during the summer, but wasn't
> quite successful, and was recommended to use SGE in its place by
> Steve and
> others. I'm giving it another go, but I've run into a few questions
> while setting up SGE. Hopefully they're easy enough to answer...
>
> -The SGE manual says that each host must have the same user account
> names and passwords, which is the alternate to XGrid's nobody:nobody
> permissions, but seems equally impractical. If SGE is really
> designed to be
> implemented in large grids, I'm not sure how it could scale beyond a
> handful of managed machines. My question is whether this is how
the
> lab is configured at Oxford or any other site, or alternately are
all
> jobs submitted under a single username?
>
> -All scan data resides on an XServe, automounted via NFS. What kind
of
> permissions are necessary on this share?
>
> -Is anyone using the multiple queues available in SGE, or only using
> one for large jobs?
>
> My basic goal is to allow users to login, start multiple large jobs
on
> the grid, then log out and retrieve the results later. The FEAT
> first-level analyses tend to tie up all the machines for a time,
> so I'd love
> to speed them up by running the jobs in parallel and in the
> background.
>
>
> Thanks in advance!
>
> Jeremy
>
>
>
>
>
>
> On Tue, 14 Aug 2007 06:24:00 +0100, Steve Smith
<[log in to unmask]>
> wrote:
>> Hi Jeremy,
>>
>> I agree with Andrew, most people seem much happier with SGE than
with
>
>> XGrid, so if it's not too late I would consider that.
>>
>> Anyway - yes, hopefully FSL 4.0 should be fairly easy to setup with
>> either (though much easier with SGE as that's what we have so should
>
>> need much less customising).
>>
>> First, see the brief intro at:
>> http://www.fmrib.ox.ac.uk/fsl/fsl/downloading.html#sge
>> So if the user runs any of these programs on a machine which can
>> submit to the cluster then that should happen automatically, after
>> the sysadmin has:
>>
>> Setup the central cluster-controlling script
>> $FSLDIR/bin/fsl_sub
>>
>> This is a heavily commented shell script and hopefully should be
>> reasonably easy to follow and customise.....
>>
>>
>> So hopefully the user can just run FSL programs on their normal
>> working machine, and if it's setup to be a cluster submission host,
>> then whenever a "big" job is run that will automatically get sent to
>
>> the queue.
>>
>> Cheers, Steve.
>>
>>
>>
>>
>>
>> On 14 Aug 2007, at 05:59, Jeremy Bronson wrote:
>>
>>> Hi All,
>>>
>>> I'm the new administrator of a neuroimaging research lab, and I'm
>>> working on getting FSL and other MRI-analysis tools running on
>>> XGrid. I'm not yet intimately familiar with how FSL works, so I'm
>
>>> hoping there are others out there who have already figured out how
>
>>> to run it on a cluster, so I don't have to reinvent the wheel.
>>> I've heard of several existing clusters that run FSL, albeit a
>>> modified version. I'm hoping it can be done with the off-the-shelf
>
>>> version, perhaps it's now possible with the just-released 4.0? If
>
>>> anybody could point me in the direction of some specifics, I'd be
>>> most grateful.
>>>
>>> We've got an XServe with RAID that houses all the data, and FSL is
>
>>> installed and configured on all host (agent) machines. Most users
>
>>> use the GUI, and manually point the tools to the appropriate data,
>
>>> so I assume they'll need to familiarize themselves with the
> command-
>>> line tools and specifying data directories on the CLI (or via
>>> GridStuffer). I'm thinking that each machine will need the data
>>> volume to be auto-mounted (NFS?) at startup with appropriate
>>> directories having read/write access for the 'nobody' user. Does
>>> this sound correct?
>>>
>>> Additionally, each tool that's part of the FSL package seems to
>>> launch a number of other UNIX commands during analysis, to copy,
>>> move and otherwise manipulate the result data. Will this confuse
>>> XGrid, or will the job and all sub-commands run until the original
>
>>> command completes? (e.g. A complete FEAT analysis)
>>>
>>> Hopefully this isn't too difficult, and afterwards I'd like to take
>
>>> the time to post a HOWTO on macresearch.org or the like, so others
>
>>> might take advantage of the info. Thanks in advance to anyone who
>
>>> might be able to help.
>>>
>>>
>>> Jeremy Bronson
>>> Systems Administrator
>>> Frey Research Lab
>>> University of Oregon
>>
>>
>>
>
>
----------------------------------------------------------------------
> --
>> ---
>> Stephen M. Smith, Professor of Biomedical Engineering
>> Associate Director, Oxford University FMRIB Centre
>>
>> FMRIB, JR Hospital, Headington, Oxford OX3 9DU, UK
>> +44 (0) 1865 222726 (fax 222717)
>> [log in to unmask] http://www.fmrib.ox.ac.uk/~steve
>>
>
>
----------------------------------------------------------------------
> --
>> ---
>>
>
>
>
>
>
>
|