Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Ian Stokes-Rees said:
> I think even at the time it was mentioned that cron jobs could have a
> similar effect, and in hindsight it seems like someone should have
> thought about the forked and detached processes issue.
I was aware of it, although I might not have mentioned it explicitly.
Partly I may have been less worried than I should have been because I
was assuming the EDG model where pool accounts were recycled very
infrequently; I'm still not clear what is actually happening now. Also,
I seem to remember that last time we went through this (last summer?
[1]) it was suggested that we should have a security audit with everyone
being invited to contribute their favourite hacks in private rather than
in a mailing list - but as far as I know it never happened.
[1] Early July, I just checked.
> 1. Surely it couldn't be too hard to find a list of all
> processes owned
> by a particular user and to kill them all immediately before the start
> of a job or immediately after the end of a job. This would require
> multiple jobs running at the same time on the same node to
> use different local accounts, but this might not be such a bad thing
It might not be a bad thing, but I think it would be quite a difficult
thing to do, it's at odds with how the pool accounts work in a fairly
basic way. Or maybe you could just prevent multiple jobs from one user
running on a given node, but that's rather a kludge and would e.g. cause
problems for production jobs submitted by a single user.
> 2. Rebooting the nodes before any account recycling is
> allowed to happen is probably a good idea.
Not very easy though, it means you have to drain the whole system to do
it, not something you could do every day or even every week. And it
wouldn't help with cron jobs.
> That way there should be no user processes
> hanging about. All DN mappings could be scrubbed on reboot under this
> circumstance as well.
Be careful with that - the DN mappings are stored on the CE
(gatekeeper), but you'd have to reboot every WN at the same time. And
remember that users can run both fork and pbs jobs directly via globus,
and with some configurations may be able to ssh from the CE to the WNs
(although in general that doesn't seem to work any more, I used to use
that a lot for debugging :)
> 3. This is making chroot, BSD jails, or some kind of
> virtualization look like a very nice solution
Maybe, but we need something which can be used everywhere, the security
chain is only as strong as the weakest link.
Stephen
|