I can see these were rhetorical questions but I'll answer them anyway.
Experiments want pilot jobs to do 'late binding' of payload to job. i.e.
wait until a job is actually executing before deciding what it does.
This is how they chose to manage the relative priority of their work
within a VO. In the steady state the grid will be full of work with
queues everywhere so some urgent piece of work may have to wait a very
long time. This is also a means of delivering global fair shares.
A secondary reason for pilot jobs, or maybe the primary one for LHCb, is
to check out a site before trusting it with a job. The pilot job starts,
checks out the environment and then gets the real job. This gets round
the current state that so many sites are badly configured or don't
advertise their state correctly in the BDII. However, this is a use case
that doesn't need glexec because the pilot and the real job are for the
same user so they can all run under the same identity. It is only
Multi-User Pilot Jobs that need to change identity.
The transfer, storage and type of proxies is one of the reasons why a
review of the multi-user pilot job frameworks are necessary and it is
one of things that will be checked.
John
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Kostas Georgiou
> Sent: 09 November 2007 00:32
> To: [log in to unmask]
> Subject: Re: PMB minutes and glexec
>
> On Thu, Nov 08, 2007 at 10:31:43PM -0000, Kelsey, DP (David) wrote:
>
>
> All I can see are reasons why we need glexec to support pilot
> jobs. What I haven't heard so far is **why** we need the
> pilot jobs in the first place.
>
> I could spend hours explaining why glexec is a bad idea (and
> I will), but before that I would like someone to tell me why
> we need the pilot jobs in the first place. So far I've only
> heard that to run pilot jobs "securely" we need a suid glexec
> (which very conveniently ignores the "small" issue of how you
> send the proxy to the pilot job in a secure way). Glexec
> might be the solution (yeah, right) but if so, it the
> solution to the wrong problem.
>
> Unfortunately, I know what the reply will be. Something close
> to: "our RB model is badly designed and it doesn't really
> work so instead of fixing it we'll hack something else on top
> of the existing hacks and at the same time we'll ingore every
> system administrator that says that this is just plain stupid
> because we know best".
>
> Kostas
>
|