Hi John,
indeed. The wiki is not complete, and it is there to be completed.
Developers were asked by the TCG insert their information, but haven't
done it so far. And I already asked two times to the dteam to put in
their own while we were discussing this but nobody has done it so far.
cheers
alessandra
Gordon, JC (John) wrote:
> Thanks Graeme, I knew this had been discussed at length but when
> speaking in a meeting one can't say, just follow this thread. I checked
> the wiki and it doesn't go into this detail. Jeremy needs the good
> summary you give.
>
> John
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
> Sent: 03 July 2007 17:27
> To: [log in to unmask]
> Subject: Re: UK input to tomorrow's WLCG GDB
>
> On 3 Jul 2007, at 16:29, Coles, J (Jeremy) wrote:
>
>> Dear All
>>
>> Tomorrow there is a GDB (happens monthly as I hope you know!) at CERN
>> with the following agenda:
>> http://indico.cern.ch/conferenceDisplay.py?confId=8485
>>
>> If you have any important issues that you would like raised/
>> discussed in
>> relation to any of these items (or others) please let me know. Current
>> items to be take up from the UK include:
>>
>> 1) Confirmation of experiment readiness to move to SL4
>>
>> 2) Confirmation that a well defined list of rpms required by the
>> experiments but not in the standard SL4 installation is available
>> (either as a list in the VO ID card for the experiment or as an
>> experiment meta-package).
>
> If ATLAS and LHCb say that they are ready to move on this then
> Glasgow are prepared to go early on this - perhaps at the end of this
> month.
>
> However, this will almost certainly be a big bang switch, not a
> gradual migration of worker nodes.
>
>> 3) To re-state that UK sites are generally opposed to running
>> glexec on
>> worker nodes (see this for background
>> http://www.sysadmin.hep.ac.uk/wiki/Glexec). I have requested more
>> information about specific objections via the T2 coordinators.
>
> Comments from an earlier email, with some clarifications (our
> position hasn't altered):
>
> Begin forwarded message:
>> We had a chat about glexec in our ScotGrid technical meeting
>> yesterday.
>>
>> Summary: it's unacceptable for glexec to be deployed with suid
>> privileges on our batch workers.
>>
>> The arguments have been made already on this thread, mainly by
>> Kostas so there's little point in running over them in great detail
>> again. However, briefly:
>>
>> 1. Edinburgh are integrating into a central university resource.
>> glexec would not be acceptable to the system team.
>
> So here we _cannot_ run glexec. It's not our choice...
>
>> 2. Glasgow do control their resource, but all suid binaries on the
>> batch workers are going to be turned off (sorry, no ping :-). We
>> don't have confidence in glexec.
>
> It's just a foolish thing to do, in our opinion. SUID binaries are a
> serious security risk. You just have to look at examples spread over
> the years (sudo, suidperl) to see that code which has been available
> for years can suddenly be discovered to be vulnerable. In addition,
> even if the code is audited now, what guarantee do we have that
> changes in the future won't open up attack vectors?
>
> Our opinion is that this is a problem of the VO's making (see 4).
>
>> 3. ...
>
> No longer an issue. glexec on the CE is different, because it's the
> gatekeeper code which is being executed (to get the job into the
> batch system), not the job payload. (A necessary evil here, we
> believe...)
>
>> 4. What we want from pilot jobs is _traceability_, i.e., a record
>> of who's payload was actually executed. Having glexec do suid
>> twiddles is a baroque and dangerous way of achieving this. We'd be
>> much happier with a query mechanism into the VO's job queue which
>> allowed us to look at who delivered the payload. Far simpler and
>> less dangerous, thanks. (Note, if the VOs insist on sending pilot
>> jobs and getting themselves into a traceability pickle then asking
>> sites to sort this mess by installing a suid binary for them is
>> laughable. We hold them responsible for their, collective, actions.
>> They have made their bed, let them lie in it - see the JSPG
>> recommendations: http://www.sysadmin.hep.ac.uk/wiki/
>> Pilot_Jobs#JSPG_.28Joint_Security_Policy_Group.29_Raccomandation)
>
> We will continue to run pilot jobs, e.g., from LHCb. We just won't
> let them suid themselves to other pool accounts.
>
> Kostas' comments on how glexec interacts with the batch system we echo:
>
>
> Begin forwarded message:
>> How are they going to use the scratch area that batch system
>> alloted to
>> the job since it is running under another uid?
>> How can the batch system kill the job if it exceeded the cpu limit?
>> How can the batch system kill runaway process sessions at the end of
>> the job?
>> How can I keep accurate accounting for cpu/memory/io if the jobs
>> aren't
>> running under the control of the batch system?
>> How can I prevent the pilot job running N jobs instead of 1
>> stealing cpu
>> cycles from the other jobs in the system if they are not under the
>> control of the batch system?
>
> Is that clear enough?
>
>> 4) Clarification on how vulnerabilities in experiment/VO code
>> should be
>> handled.
>
> Examples? It's up to the VOs to protect the resources we give them.
> We'll bill them for everything ;-)
>
> Hope that helps
>
> Graeme
>
> --
> Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
> ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
>
--
Alessandra Forti
NorthGrid Technical Coordinator
University of Manchester
|