JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for TB-SUPPORT Archives


TB-SUPPORT Archives

TB-SUPPORT Archives


TB-SUPPORT@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

TB-SUPPORT Home

TB-SUPPORT Home

TB-SUPPORT  July 2007

TB-SUPPORT July 2007

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: UK input to tomorrow's WLCG GDB -glexec

From:

Alessandra Forti <[log in to unmask]>

Reply-To:

Testbed Support for GridPP member institutes <[log in to unmask]>

Date:

Thu, 5 Jul 2007 15:15:48 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (275 lines)

Hi David,

I cannot comment on the policy as I haven't seen it yet. I don't think 
it has been broadcasted to site security lists for comments yet.

I don't want to ban pilot jobs I just want them not to make glexec 
compulsory. You might have your own reasons to want it I have my reasons 
not to want it (I don't give sudo privileges to users).

As for the paper I'll be glad to see it when it comes out. I was 
expcecting you started at least to write on the wiki in the meantime.

cheers
alessandra

David Groep wrote:
> Hi Alessandra,
> 
> It's good to see that at least there is a substantial discussion
> about glexec, the code and code review aspects, and the requirements
> and deployment scenarios. There are two or three threads of discussion in
> your email that I think it's good to separate out:
> 
> - code review and quality
>   What you are looking at now is code that is in preview stage as
>   far as gLite is concerned: it's installed on selected preview testbed
>   sites, and is being considered for inclusion in subsequent releases.
>   I feel, and I think you also express that feeling, that before it goes
>   to production, such a security-critical piece of code should be
>   reviewed by external people. I would like to see and encourage such
>   a review, as is being done now on the VOMS code, before it goes
>   to the sites.
> 
>   The fact that FNAL deployed this preview version on their systems
>   out of urgent necessity and for regulatory reasons, does not need
>   to imply that any EGEE site should do likewise.
> 
> - effects of glexec in operational environments
>   In this respect, glexec behaves /almost/ as a standard sudo, in
>   that it heeps the process tree (and thus the accounting), and
>   if a job is killed by root, this killing will affect the entire
>   process tree, including any processes with a different uid/gid.
>   Of course, glexec does not give you additional protection
>   against process that daemonize, but it also does not lower the
>   level of protection.
>   Note that this in *independent* of any epilogues, batch system
>   types, or even the fact that you are running any batch system. If
>   also holds for jobs that are directly forked
> 
>   The difference between glexec and sudo is that glexec will keep
>   open any file descriptors across the uid change, so that you
>   can communicate with your child (i.e. send it instructions or
>   kill it externally).
> 
>   This all follows standard unix semantics, and is not specific to
>   PBS, LSF, Condor, or any other system. And certainly it is not
>   linked to the existence of any specific NIKHEF installation or
>   epilogue scripts -- our SW development and operational activities
>   are quite separated to prevent "leakage" of assuptions either way.
> 
> - policy issues with respect to pilot jobs and glexec-on-WN
>   scenarios
> 
>   These issues should not be raised (again) with the developers, but
>   be brought to the attention of the policy bodies (GDB, ROC Managers)
>   etc. The JSPG is drafting a policy on this issue, as you are no doubt
>   aware, that was presented by Dave Kelsey already to the LCG GDB.
>   It will in due to also be presented to EGEE management &c.
>   It leaves the sites the choice of several models (no glexec, non-
>   privileged glexec, suid-glexec)
>   There are several deployment scenarios, and each of them is
>   suited to a specific operational, regulatory and legal environment
>   that may exist at a particular site.
>   Rightfully, IMHO, the draft policy outlines several options, and how
>   the VO should in each of these conditions fully comply with the site
>   requirements.
> 
>   This should also address the quality of (VO developed) software, and
>   how that deals with their own security issues, much like the
>   WMS/RB does today in other models. Also on this issue, there is a
>   draft policy being circulated AFAIK.
> 
>   Then, the fact whether or not you like or want to accept pilot
>   jobs is a site choice. Some VOs, for some reason or another, seem
>   to be quite fond of them, whilst indeed the majority of the VOs
>   are satisfied with the regular submission model, or indeed could
>   not do anything else since a pilot job model would impair too
>   much functionality.
> 
>   What glexec gives you, is the possibility for policy-compliant VOs
>   to add your site specific authorisation requirements, and a way to
>   enfore your own policy on top o what the VO send you. Via glexec,
>   you gain (even in a non-setuid model) the possibility to inform the
>   VO pilot job of your site's policy, so that a cooperating VO will then
>   not start such a job.*
>   By adding setuid capabilities, you gain the possibility to trace
>   individual processes at the unix level on shared multi-user systems,
>   such as batch nodes that are used by more than one job concurrently.
>   If you have only one job per node (as is common in many low-latency
>   HPC environments), the setuid capability is superfluous anyway, and
>   in those cases I would personally recommend agains setuid (but keep
>   glexec in non-setuid mode to enforce my own authZ decisions and
>   site-bin-lists).
> 
>   * if you find that a VO violates this policy, you can always ban the VO,
>   and with a good reason...
> 
> I hope you, and many others, will appreciate that a more in-depth
> paper on glexec will be forthcoming over the next two month, that should
> explain a bit more about the rationale and deployment models that lead
> us to the develpoment of this component.
> 
>     Cheers,
>     DavidG.
> 
> 
> Alessandra Forti wrote:
>> Hi Oscar,
>>
>> since the fact that glexec is derived from suexec in one of the . Tell 
>> me what you read in the first 100 or so lines
>>
>> http://httpd.apache.org/docs/2.0/suexec.html
>>
>> glexec code has never been extensively tested.
>>
>> Kostas has found already at list half a dozen problems and two or 3 
>> bugs just glancing at the code.
>>
>> In the past month we have had 3 cases of improperly set permissions 
>> that allow to delete files. I cannot even think about if this happened 
>> when sites as big as liverpool and manchester deploy this stuff.
>>
>> Not all the sysadmins are acquanted with suexec configuration and 
>> glexec configuration might be similar but surely it must have an extra 
>> layer of complication since it is connected to lcas/lcmaps.
>>
>> The glexec executable should be called by user code. Have they 
>> mustered delegation code? or are they still planning to use gridftp to 
>> download the proxies from the server? In any case I do not trust the 
>> users to be able to write any secure code by default. It is simply not 
>> ingrained in their mentality.
>>
>> Other problems are certainly the way VOs are trying to optimise job 
>> submission on shared resources introducing extraneous software like 
>> glexec on the worker nodes. Not all the clusters are dedicated, not 
>> all the clusters use PBS on which you are basing most of your 
>> deployment trials.
>>
>> In a previous email you stated that glexec doesn't interfere with the 
>> normal batch system operation. Changing UID won't affect accounting, 
>> automatic creation of directories, killing of daemonised and runaway 
>> processes because anyway that's a problem that can be solved in the 
>> epilogue script as it is done now because the sid tree is preserved. 
>> Which means you are considering strictly speaking only the epilogue 
>> nikhef is using. True, we have given it a big push to be deployed 
>> elsewhere but this is not compulsory. This without counting that 
>> different sites might use different batch systems.
>>
>> There is also the question that the pilot job in this scheme is not 
>> run with a user proxy but with a service proxy. Tell me ho do you call 
>> a job that can run for up to 72/92h (default cputime/walltime setup by 
>> YAIM) contact services, pull other people jobs and use other people 
>> proxies all this while changing UID and without even an owner because 
>> suddenly this is a service? To me it seems a VOBOX on the WN if the 
>> word permanent is replaced by days.
>>
>> And if we reduce the queues to a ~11 hours so that they can run only 
>> one job? Where do they get the advantage to use this model and why 
>> should I introduce something as potentially dangerous as glexec on 
>> hundreds on nodes? As a matter of fact I can't think about debugging a 
>> problem with changes of IDs and file ownership in the log files
>>
>> Certainly we also question the way users optimise their job submission 
>> as we are all working at a project that established that a push model 
>> was optimal while some users decided that they wanted a pull model on 
>> top. So yes there are other problems, but the main one for me is a 
>> setuid program on the WN. It beats me that people can't see it.
>>
>> cheers
>> alessandra
>>
>> Oscar Koeroo wrote:
>>
>>> Hi,
>>>
>>> If these questions are raised, then I think that sites should ban in- 
>>> and outbound network connectivity from the WNs or use a network 
>>> arbitrator.
>>>
>>> Technical software issues in glexec doesn't seem to be the core of 
>>> the problem.
>>>
>>> Users (and their VOs) have their reasons for working around the 
>>> regular queues to send work to a WN in a more optimal way, in the 
>>> user perspective, for execution. We simply provide a tool that can 
>>> perform needed authorization checks and user switching where it 
>>> wasn't before.
>>>
>>> I don't understand how this tool would be going against site 
>>> policies. It serves the purpose also for both the site and the VOs 
>>> themselves to have more control over what is executed by who. Without 
>>> it, you wouldn't know who has executed which part of the real user 
>>> jobs' payload from within a pilot job.
>>>
>>> As I would understand it, the glexec tool would aid the security 
>>> infrastructure by being able to tell more about the pilot job. This 
>>> should be in coherence with the VO's pilot job infrastructure.
>>>
>>>
>>> cheers,
>>>
>>>     Oscar
>>>
>>>
>>>
>>>
>>>
>>> Cornwall, LA (Linda) wrote:
>>>
>>>> Dear UK TB support, GSVG RAT, Kostas, Alessandra, Oscar, SCG,
>>>>
>>>> It looks like multiple threads have developed concerning glexec, and in
>>>> summary the problems seem to be:--
>>>>
>>>> Pilot jobs turn the push model into a pull model, is this acceptable at
>>>> all?
>>>>
>>>> Does the Glexec/pilot job design in principle contradict security
>>>> requirements? They have not been updated for a while, but for 
>>>> example I quote from the
>>>> EGEE(I) requirements
>>>> (https://edms.cern.ch/file/485295/1/EGEE-JRA3-TEC-485295-UserReq-v1-0.pd 
>>>>
>>>> f )
>>>> In the Auditing requirements
>>>> "It must be possible to trace the distinguished name (DN) of the
>>>> certificate used for the original job submission."
>>>>
>>>> Does the Glexec/pilot job design in principle introduce vulnerabilities
>>>> that are inherent in the design, rather than being bugs that can be
>>>> fixed. Hence we have a serious vulnerability issue that needs careful
>>>> consideration with SCG, TCG and others and a redesign/rewrite is 
>>>> needed.
>>>>
>>>> Does the Glexec/pilot job design in principle contradict the agreed
>>>> policy?
>>>>
>>>> Does the way Glexec is being used by VOs contradict the agreed policy?
>>>> Is there something else wrong with glexec that is obvious to sites?
>>>> I can't help thinking if Kostas and Alessandra are not happy something
>>>> isn't right.
>>>>
>>>> Glexec has some implementation flaws, which can be fixed as a
>>>> straightforward vulnerability bug.
>>>>
>>>> It seems to me that something may have gone wrong between satisfying
>>>> security requirements, ensuring design flaws that cause vulnerabilities
>>>> are not present,  ensuring design flaws that contradict policy needs 
>>>> are
>>>> not introduced...  This is not just a UK TB matter, or just an
>>>> operational matter, but something that needs investigating to find
>>>> whether or not there is a serious problem.  Linda
>>>>
>>>>
>>>>
>>
> 
> 

-- 
Alessandra Forti
NorthGrid Technical Coordinator
University of Manchester

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager