On 05/02/14 15:16, Gareth Roy wrote:
> Yeah, so the voms,gird and group map files should be the same on all
> the machines… The easy way to see if the mappings are actually
> happening is to check /etc/grid-security/gridmapdir/ and see if
> mappings to the “pilops” (or whatever you’ve called yours) pool
> accounts are being made. For completeness here I do:
>
> /etc/grid-security/gridmapdir# ls -ltri |grep pilops |cut -d' '
> -f1|while read inode; do ls -ltri |grep $inode; done
>
> To show all the “pilops" accounts and the associated DN mappings. If
> there are some actual mappings then Argus is doing it’s job and the
> break is back on the WN… if there are no mappings then Argus is
> having issues somehow.
Thanks Gareth, I do get some mappings I think on the Argus server:
...
3147133 -rw-r--r-- 1 root root 0 Dec 13 2012 pilops09
3147134 -rw-r--r-- 1 root root 0 Dec 13 2012 pilops10
3147132 -rw-r--r-- 2 root root 0 Dec 18 23:13 pilops08
3147132 -rw-r--r-- 2 root root 0 Dec 18 23:13
%2fc%3duk%2fo%3descience%2fou%3doxford%2fl%3doesc%2fcn%3dkashif%20mohammad:pilops:ops
3147132 -rw-r--r-- 2 root root 0 Dec 18 23:13 pilops08
3147132 -rw-r--r-- 2 root root 0 Dec 18 23:13
%2fc%3duk%2fo%3descience%2fou%3doxford%2fl%3doesc%2fcn%3dkashif%20mohammad:pilops:ops
3147126 -rw-r--r-- 2 root root 0 Feb 5 16:29 pilops02
3147126 -rw-r--r-- 2 root root 0 Feb 5 16:29
%2fc%3duk%2fo%3descience%2fou%3doxford%2fl%3doesc%2fcn%3dkashif%20mohammad%2fcn%3drobot%3agridclient:pilops:ops
3147126 -rw-r--r-- 2 root root 0 Feb 5 16:29 pilops02
3147126 -rw-r--r-- 2 root root 0 Feb 5 16:29
%2fc%3duk%2fo%3descience%2fou%3doxford%2fl%3doesc%2fcn%3dkashif%20mohammad%2fcn%3drobot%3agridclient:pilops:ops
Is this the only place I should expect this? There is no gridmapdir on
the WNs but there is on the cream server.
I've found something else today on the WNs which looks to be perhaps the
problem.
I turned on maximum log output for glexec on the WN earlier (somehow I
missed this variable when looking through /etc/glexec.conf before) and
immediately saw the following:
glexec[51695] 20140205T145808Z: Reading in
GLEXEC_CLIENT_CERT='/mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy'.
glexec[51695] 20140205T145808Z: Could not lock file during reading of
proxy
/mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy.
glexec[51695] 20140205T145808Z: Reading proxy failed.
glexec[51695] 20140205T145808Z: Failed to lock
$GLEXEC_CLIENT_CERT=/mnt/lustre/grid/users/pilatl01/home_cream_445503617/cream_445503617.proxy,
$GLEXEC_SOURCE_PROXY=(NULL) or destination proxy.
I'm not sure yet though why this is failing but these messages are
occuring at the time the nagios check fails so they are likely the reason.
Matt
|