On 04/15/2016 04:28 PM, Winnie Lacesso wrote: > Is it common to NFS-mount /etc/grid-security/vomsdir? The /etc/grid-security/vomsdir directory is referred to when security decisions are made. It contains LSC (list-of-certificate) files, which are used to verify that a certain server is trusted. Since we use ARGUS for centralised worker-node security, then we don't need to share it. Even when we used local worker-node security, we did not share the vomsdir, although it should be OK to do so. We still use local node security on DPM, I think, as it is not rigged up to work with ARGUS (I don't think DPM _can_ be rigged up to use ARGUS, but I'm not sure about that.) In any-case, we don't share vomsdir on DPM either, but it should be possible to do so I would have thought. And it would serve to keep things consistent so perhaps it's a good idea. > Apparently updating voms package (not happen often, but) wants to write to > /etc/grid-security/vomsdir Yes. I can see voms pkg claims it: # rpm -ql voms.x86_64 /etc/grid-security /etc/grid-security/vomsdir /usr/lib64/libvomsapi.so.1 /usr/lib64/libvomsapi.so.1.0.0 /usr/share/doc/voms-2.0.12 /usr/share/doc/voms-2.0.12/AUTHORS /usr/share/doc/voms-2.0.12/LICENSE /usr/share/voms /usr/share/voms/vomses.template > My colleague says he's seen the voms pkg update FAIL due to this. > So when updating voms pkg, /etc/grid-security/vomsdir has to be > unmounted, update the voms pkg, then remount it. > > What would happen to new jobs that arrive on WN in that time? Fail if > /etc/grid-security/vomsdir is, erm, an empty mountpoint? > > Or, is it: "don't update voms pkg when WN running jobs - must be drained." What follows is my vague opinion on how things work - I could be miles off but here goes. As far as I know, jobs are authenticated, authorised and mapped on the condor head-node (the same applies for torque) prior to hitting the worker-node. They arrive at the worker-node already as the right user. You can verify this by seeing that the condor_shadow tasks on the headnode are run as the proper user, e.g. prdatl26 30722 19300 0 Apr17 ? 00:00:00 condor_shadow So there is limited use for vomsdir on the worker node, esp. when using ARGUS. Indeed, at our site, the worker nodes are badly configured with a partial set of vomsdir (just LHC exps) yet it still works fine for all VOs (the bad config is an artefact of our puppet setup which I must clean-up some day!!!) So, in summary and storage notwithstanding, my theory is: (a) only glexec uses voms on the worker nodes, (b) if using ARGUS you don't need correct vomsdir on workernodes, and (c) when sharing vomsdir, jobs arriving don't matter if vomsdir unavailable because they are already in the right user. So only concern is if a job already running tries to switch user using glexec while vomsdir unmounted. In such a case, theory is that job fails verification and dies. So my workaround would be: on each node one at a time, quickly unmount vomsdir, quickly do voms pkg update, quickly remount vomsdir! Cheers, Ste -- Steve Jones [log in to unmask] Grid System Administrator office: 220 High Energy Physics Division tel (int): 43396 Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396 University of Liverpool http://www.liv.ac.uk/physics/hep/