JISCMail - TB-SUPPORT Archives

On 04/15/2016 04:28 PM, Winnie Lacesso wrote:
> Is it common to NFS-mount /etc/grid-security/vomsdir?

The /etc/grid-security/vomsdir  directory is referred to when security 
decisions are made. It contains LSC (list-of-certificate) files, which 
are used to verify that a certain server is trusted. Since we use ARGUS 
for centralised worker-node security, then we don't need to share it. 
Even when we used local worker-node security, we did not share the 
vomsdir, although it should be OK to do so. We still use local node 
security on DPM, I think, as it is not rigged up to work with ARGUS (I 
don't think DPM _can_ be rigged up to use ARGUS, but I'm not sure about 
that.) In any-case, we don't share vomsdir on DPM either, but it should 
be possible to do so I would have thought. And it would serve to keep 
things consistent so perhaps it's a good idea.

> Apparently updating voms package (not happen often, but) wants to write to
> /etc/grid-security/vomsdir

Yes. I can see voms pkg claims it:

# rpm -ql voms.x86_64
/etc/grid-security
/etc/grid-security/vomsdir
/usr/lib64/libvomsapi.so.1
/usr/lib64/libvomsapi.so.1.0.0
/usr/share/doc/voms-2.0.12
/usr/share/doc/voms-2.0.12/AUTHORS
/usr/share/doc/voms-2.0.12/LICENSE
/usr/share/voms
/usr/share/voms/vomses.template

> My colleague says he's seen the voms pkg update FAIL due to this.
> So when updating voms pkg, /etc/grid-security/vomsdir has to be
> unmounted, update the voms pkg, then remount it.
>
> What would happen to new jobs that arrive on WN in that time? Fail if
> /etc/grid-security/vomsdir is, erm, an empty mountpoint?
>
> Or, is it: "don't update voms pkg when WN running jobs - must be drained."

What follows is my vague opinion on how things work - I could be miles 
off but here goes.

As far as I know, jobs are authenticated, authorised and mapped  on the 
condor head-node (the same applies for torque) prior to hitting the 
worker-node. They arrive at the worker-node already as the right user. 
You can verify this by seeing that the condor_shadow tasks on the 
headnode are run as the proper user, e.g.

prdatl26 30722 19300  0 Apr17 ?        00:00:00 condor_shadow

So there is limited use for vomsdir on the worker node, esp. when using 
ARGUS. Indeed, at our site, the worker nodes are badly configured with a 
partial set of vomsdir (just LHC exps) yet it still works fine for all 
VOs (the bad config is an artefact of our puppet setup which I must 
clean-up some day!!!)

So, in summary and storage notwithstanding, my theory is: (a) only 
glexec uses voms on the worker nodes, (b) if using ARGUS you don't need 
correct vomsdir on workernodes,  and (c) when sharing vomsdir, jobs 
arriving don't matter if vomsdir unavailable because they are already in 
the right user. So only concern is if a job already running tries to 
switch user using glexec while vomsdir unmounted. In such a case, theory 
is that job fails verification and dies. So my workaround would be: on 
each node one at a time, quickly unmount vomsdir, quickly  do voms pkg 
update, quickly  remount vomsdir!

Cheers,

Ste




-- 
Steve Jones                             [log in to unmask]
Grid System Administrator               office: 220
High Energy Physics Division            tel (int): 43396
Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
University of Liverpool                 http://www.liv.ac.uk/physics/hep/