Dave, I think you may be over-reacting a bit here. Kostas has found a problem - full marks to him; as Steve B points out there has been a lack of people doing systematic hole finding. As for what happens now - I do not think the correct response is for us to run around shouting 'You can't believe any identity on the Grid!'. After all, you don't hear about many security exposures until a fix has been produced. I'd rather see a considered and detailed analysis of the conditions under which this happens. Does it affect all flavours of Linux, other Unixes, other PBS, other batch?
Can I get a clear picture of the exposure? If a process is left hanging around on a WN after a job ends and is still running after the pool accounts have been recycled then it shares its uid/gid with the next user to be assigned that pool account. Have I got it right? What is the likelihood of the process still running when pool accounts are recycled? Has anyone ever recycled pool accounts on a running farm?
As for identification, since the early days of LCG1 (if not LCG0) all LCG sysadmins have been told to keep a variety of log files safely for some months. It was just for this reason so that actions could be traced back in time and the identity of the DN mapping to a pool account at any period of time could be established.
So, to my naïve eye (correct me please) the exposure is to the nefarious actions of an authorised LCG user who can be identiified in retrospect, who leaves rogue processes hanging around on the chance that the pool accounts on a farm will be recycled before the WN gets rebooted. He (statistically) can then use the short-lived restricted proxy to do bad things as the user (eg DoS, go trawling for sensitive files and keys belonging to the user on other systems to which the proxy has access).
So, what to do?
A) we should tell Ian Neilson and Ian Bird and let them decide what action should be taken on the rest of LCG. In Dave Kelsey's absence, I'll do that.
B) We should take the action Steve T suggested and not recycle pool accounts until further notice. If you have a cron doing this, stop it now. Suitable backup/restore of the pool directory should make pool allocations live over an upgrade/install. Otherwise reboot WN or otherwise kill user processes on them when reinstalling a CE.
C) we should form a small team to check out whether this behaviour is repeated on other flavours of Linux, PBS etc. I delegate this to Jeremy, volunteers to him.
D) Take discussion off-list to Team (C) who will give regular feedback on progress.
E) Anything else?
John
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Dr D J Colling
> Sent: 14 February 2005 20:39
> To: [log in to unmask]
> Subject: Serious security issue (was Re: PBS problem on TB support)
>
>
> Dear All,
>
> Following last week's discussion about security issues I
> asked Kostas to look for them and then to test and report any
> that he finds. This is the first one that he has done this
> for. This clearly is a very serious problem as it removes the
> definite mapping between a user running a job and the real
> person, so breaking the security policy at many sites. As
> Steve T. says this is a serious problem however I don't see
> any activity correcting it.
>
> What are people supposed to do when they find such a hole and
> what effort do we have to correct such holes? I seem to
> remember that there was effort in the GridPP2 tender for
> Security support. What happenned to this effort and is this
> sort of thing appropriate for them to work upon (in a support role)?
>
> Security is not like data movement or workload management.
> Somebody finds an in-efficiency in these pieces of middleware
> and you have time to put them right (and there are people
> working on them). Once a security hole is discovered there
> has to be rapid action correct it before it is exploited and
> what is more I don't see any activity in their correction.
>
> Now that it has been pointed out to you that you do not know
> who is actually running a process, which, if any, sites are
> running this software legally (i.e. not breaking their
> college usage policies)? I suspect that people will keep
> ignoring this (which in itself is technically worthy of
> dismissal at Imperial) in order to keep their sites on the
> LCG. Is this really the way we want to proceed? What do we
> say to our security officers when somebody exploits such a
> weakness to launch (say) a DoS attack? "Oh, yes, we knew it
> was there some months ago but we ignored it..."?
>
> I don't know what the answer is here, but I do see that we
> have a real problem and I would like to see discussion about
> a way foreward. Most of all I don't want to see it ignored
> and swept under the carpet.
>
> All the best,
> david
|