Hi,
Just heard from Matt Doidge that some jobs run (e.g.) voms-proxy-info
while running.
But the theory still holds; reiterate:
(a) glexec uses voms on the worker nodes and jobs may use (e.g.)
voms-proxy-info but this is not usually a continuous process, i.e. it
just happens now and again.
(b) if using ARGUS you don't need correct vomsdir on workernodes.
(c) when sharing vomsdir, jobs arriving not affected if vomsdir
momentarily unavailable because they are already in the right user.
So only concern is if a job already running tries to switch user using
glexec (or runs voms-proxy-info ...) while vomsdir unmounted. In such a
case, theory is that job fails verification and dies. Tough luck.
So my workaround would be: on each node one at a time, quickly unmount
vomsdir, quickly do voms pkg update, quickly remount vomsdir.
Ste
On 04/18/2016 11:08 AM, Stephen Jones wrote:
> On 04/15/2016 04:28 PM, Winnie Lacesso wrote:
>> Is it common to NFS-mount /etc/grid-security/vomsdir?
>
> The /etc/grid-security/vomsdir directory is referred to when security
> decisions are made. It contains LSC (list-of-certificate) files, which
> are used to verify that a certain server is trusted. Since we use
> ARGUS for centralised worker-node security, then we don't need to
> share it. Even when we used local worker-node security, we did not
> share the vomsdir, although it should be OK to do so. We still use
> local node security on DPM, I think, as it is not rigged up to work
> with ARGUS (I don't think DPM _can_ be rigged up to use ARGUS, but I'm
> not sure about that.) In any-case, we don't share vomsdir on DPM
> either, but it should be possible to do so I would have thought. And
> it would serve to keep things consistent so perhaps it's a good idea.
>
>> Apparently updating voms package (not happen often, but) wants to
>> write to
>> /etc/grid-security/vomsdir
>
> Yes. I can see voms pkg claims it:
>
> # rpm -ql voms.x86_64
> /etc/grid-security
> /etc/grid-security/vomsdir
> /usr/lib64/libvomsapi.so.1
> /usr/lib64/libvomsapi.so.1.0.0
> /usr/share/doc/voms-2.0.12
> /usr/share/doc/voms-2.0.12/AUTHORS
> /usr/share/doc/voms-2.0.12/LICENSE
> /usr/share/voms
> /usr/share/voms/vomses.template
>
>> My colleague says he's seen the voms pkg update FAIL due to this.
>> So when updating voms pkg, /etc/grid-security/vomsdir has to be
>> unmounted, update the voms pkg, then remount it.
>>
>> What would happen to new jobs that arrive on WN in that time? Fail if
>> /etc/grid-security/vomsdir is, erm, an empty mountpoint?
>>
>> Or, is it: "don't update voms pkg when WN running jobs - must be
>> drained."
>
> What follows is my vague opinion on how things work - I could be miles
> off but here goes.
>
> As far as I know, jobs are authenticated, authorised and mapped on the
> condor head-node (the same applies for torque) prior to hitting the
> worker-node. They arrive at the worker-node already as the right user.
> You can verify this by seeing that the condor_shadow tasks on the
> headnode are run as the proper user, e.g.
>
> prdatl26 30722 19300 0 Apr17 ? 00:00:00 condor_shadow
>
> So there is limited use for vomsdir on the worker node, esp. when
> using ARGUS. Indeed, at our site, the worker nodes are badly
> configured with a partial set of vomsdir (just LHC exps) yet it still
> works fine for all VOs (the bad config is an artefact of our puppet
> setup which I must clean-up some day!!!)
>
> So, in summary and storage notwithstanding, my theory is: (a) only
> glexec uses voms on the worker nodes, (b) if using ARGUS you don't
> need correct vomsdir on workernodes, and (c) when sharing vomsdir,
> jobs arriving don't matter if vomsdir unavailable because they are
> already in the right user. So only concern is if a job already running
> tries to switch user using glexec while vomsdir unmounted. In such a
> case, theory is that job fails verification and dies. So my workaround
> would be: on each node one at a time, quickly unmount vomsdir,
> quickly do voms pkg update, quickly remount vomsdir!
>
> Cheers,
>
> Ste
>
>
>
>
--
Steve Jones [log in to unmask]
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
|