> So... the only real question is to support pilot jobs or not? Three of
> the four LHC VOs say they *have* to have them.
Are the reasons for why they must have them documented somewhere?
I developed what may have been the first pilot job system in LCG in 2004
when I was working for LHCb (we called them glide-in, inspired by the
Condor strategy). I can tell you why we did it: the RB and MDS/BDII
system was hopeless so we took things into our own hands, submitted
pilot jobs with minimum properties common to all our jobs, and then once
they actually got onto a WN they could run a benchmark, check the local
system configuration (software, memory, drive space), and then request
an LHCb job from our own job DB. Also, it allowed us to do "pull
scheduling" (/a la/ Condor). I understand gLite now has pull
scheduling, so perhaps the reason pilot jobs are required is because
experiments still find the RB and monitoring system to be unsatisfactory
for their needs.
> Discussion focussed on whether we should require Sites to run the
> Grid-provided utility (today glexec) in the identity switching mode or
> whether Sites should have the right to choose not to switch identities.
Surely the "grander vision" must allow sites to decide. If a site wants
to give all grid users root access, then that is their prerogative
(although it would be nice to know, so *users* could decide they didn't
want their jobs running there).
Obviously the issue is sandboxing/isolation of activities and access (to
processes and data). Surely there are other techniques which people are
looking into to achieve this: chroot, virtualization, etc.
And what happens when a user needs/wants to combine multiple
identities/roles to perform a set of operations (i.e. access to
different data sets). It seems to me we will pretty quickly run into
situations where VO-level access isn't good enough, and users will be
responsible for executing grid actions under the "appropriate" VO for
accounting purposes but may make use of other VO identities (or roles,
ACs, or non-X.509 identities) for data/service access as part of a
larger task. OK, so this is beyond the issue of pilot jobs, but perhaps
it gives us some insight if we see that it could be common for users to
need multiple identities and to be able to select/change identities
within a given job.
> first. This also allows for the fact that EGEE has a working group on
> portals looking into the various issues, which will not conclude its
FWIW, TeraGrid makes significant use of portals. I have always wondered
how grid policy people felt about portal-based access and trust issues
here for identity delegation and management by an intermediary party
(i.e. the portal) between the user and the underlying resource.
And as someone commented on PBS_MOM, *lots* of grid/cluster/server
software uses setuid, so objecting to pilot jobs purely from a dislike
of setuid doesn't seem so reasonable.
Cheers,
Ian
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://grid.physics.ox.ac.uk/~stokes
|