On Fri, Nov 09, 2007 at 08:53:43AM -0000, Gordon, JC (John) wrote:
> I can see these were rhetorical questions but I'll answer them anyway.
>
> Experiments want pilot jobs to do 'late binding' of payload to job. i.e.
> wait until a job is actually executing before deciding what it does.
> This is how they chose to manage the relative priority of their work
> within a VO. In the steady state the grid will be full of work with
> queues everywhere so some urgent piece of work may have to wait a very
> long time. This is also a means of delivering global fair shares.
I am sorry but I fail to see how a pilot job will be able to magically
find free slots if everything is in use. Pilot jobs or not you aren't
going to run until a job finishes is free.
If a project wants to have it's own priorities they can have their own
RB that implements this functionality. Batch systems support higher
priority jobs and they can even suspend other jobs if the high priority
jobs need to run right NOW instead of the next free slot. Just because
they can't be bothered to do it correctly or they aren't competent
enough is not a reason to hack the existing model to pieces.
> A secondary reason for pilot jobs, or maybe the primary one for LHCb, is
> to check out a site before trusting it with a job. The pilot job starts,
> checks out the environment and then gets the real job. This gets round
> the current state that so many sites are badly configured or don't
> advertise their state correctly in the BDII. However, this is a use case
> that doesn't need glexec because the pilot and the real job are for the
> same user so they can all run under the same identity. It is only
> Multi-User Pilot Jobs that need to change identity.
>
> The transfer, storage and type of proxies is one of the reasons why a
> review of the multi-user pilot job frameworks are necessary and it is
> one of things that will be checked.
And what happens when it's found that it can not be done in a secure way?
Are we going to scratch the glexec idea?
Cheers,
Kostas
|