Print

Print


Hi Matt,

On 05/02/2014 14:41, Matt Raso-Barnett wrote:
> Hi Gareth,
>
> On 05/02/14 13:43, Gareth Roy wrote:
>> This may be a dumb suggestion but just throwing out ideas… if you’ve
>> already checked this then feel free to ignore. If your local tests are
>> passing and the infrastructure looks right then it _could_ be a problem
>> with the Argus policy for glexec (I’ve just see the same error reported
>> at another site when we were trying to get glexec up and running).
>>
>> If you do a “pap-admin lp” on your Argus server you should get a list of
>> all the currently viable policies. If you check in the section that is
>> headed:
>>
>> resource "http://authz-interop.org/xacml/resource/resource-type/wn" {
>>      obligation
>> "http://glite.org/xacml/obligation/local-environment-map" {
>>      }
>>
>> Your looking for something that looks like:
>>
>>          rule permit { pfqan="/ops/Role=pilot/Capability=NULL" }
>>          rule permit { pfqan="/ops/Role=pilot" }
>>          rule permit { pfqan="/ops/Role=NULL/Capability=NULL" }
>>          rule permit { pfqan="/ops” }
>>
>> Another example and complete Argus instructions can be found here
>> https://www.gridpp.ac.uk/wiki/Argus_Server
>
> Our Argus policy for this is:
> resource "http://authz-interop.org/xacml/resource/resource-type/wn" {
>      obligation "http://glite.org/xacml/obligation/local-environment-map" {
>      }
>
>      action "http://glite.org/xacml/action/execute" {
>          rule permit { pfqan="/atlas/Role=pilot" }
>          rule permit { pfqan="/atlas/Role=lcgadmin" }
>          rule permit { pfqan="/atlas/Role=production" }
>          rule permit { pfqan="/atlas" }
>          rule permit { pfqan="/ops/Role=pilot" }
>          rule permit { pfqan="/ops/Role=lcgadmin" }
>          rule permit { pfqan="/ops" }
>          rule permit { vo="dteam" }
>      }
> }
>
> Which matches the wiki mostly, but I don't have the 'Capability=NULL'
> part. Could I ask what that part relates to and should I add that in?

I don't have the "Capability=NULL" either. My Argus entries look much 
the same as yours (and they work for me!).

>
>> I note you were looking for NOT AUTHORIZED in the logs and not seeing
>> anything but I’ve had Argus fail _slightly_ differently in the case of
>> glexec. If you look in the audit log you might see things that instead
>> of Permit are all Nulls, or NotApplicable rather than seeing a NOT
>> AUTHORIZED in the process.log or in /var/log/messages. It’s not helpful
>> but indicative of the authz process failing to find something.
>
> So on the Argus server everything is showing as Permit at the moment,
> even around the time when the Nagios tests are failing.
>
> I know when running a manual test with my own cert as part of the dteam
> vo, that Argus would not allow me to run glexec, until I added the
> policy above to permit vo=dteam -- so I think at least partially the WNs
> and the argus server are working in *some* cases.
>
>> p.s Another thought is I see in your last set of emails you made sure
>> pilops mappings were right on the worker nodes, did the same thing
>> happen on the Argus server? It uses the grid map files to know which
>> pool accounts to map DN’s to so they need to be available on the Argus
>> server as well… something that always bites me :)
>
> I did need to update the worker nodes to have the same
> /etc/yaim/users.conf and groups.conf as the Argus and Cream servers - is
> that what you mean here? The /etc/grid-security/grid-mapfile is present
> on WNs, argus and cream and is the same on all too.

You need to use the same users.conf and groups.conf on all of Argus, 
Cream CE and WNs (and if you change these, run yaim everywhere!). One 
thing which bit me - make sure that the pilot groups can submit to the 
queue(s) the jobs expect to run on. My ATLAS gLExec tests failed until I 
realised that they were submitting to a queue which ATLAS didn't have 
permission to run jobs on :-(

John

>
> Thanks!
> Matt