Hi all,
Sorry to resurrect this old thread, but I haven't managed to make any
progress on this problem in a while and am very stuck. I have been
talking quite a bit to Steve off the list (thanks very much for this
Steve!) and made a bit of progress on the problem, but I am now not sure
what to do.
To recap the problem we have:
We are failing the nagios check:
emi.cream.glexec.WN-gLExec-/ops/Role=pilot
with error:
client error (201)
which as Gareth and John pointed out, is likely to be a problem on our WNs
"""
201 - client error, which includes:
• no proxy is provided
• wrong proxy permissions
• target location is not accessible
• the binary to execute does not exist
• the mapped user has no rigths to execute the binary when
GLEXEC_CLIENT_CERT is not set
"""
Steve suggested the following test, which completed correctly for us:
On 16/01/14 14:22, Stephen Jones wrote:
> I guess the first thing to do is to decide if it's the server or the
> client that is flaky (or both!) Let's kick off with this test. On some
> UI, in your own account, run this command:
>
> # voms-proxy-init --voms dteam
>
> Then this:
>
> # voms-proxy-info
>
> That will show the file you made. Next, be on test worker node, as root.
> Copy in the proxy with scp
> from the location shown in voms-proxy-init to /tmp/x509up_u460 on the
> WN. Change the ownership of proxy to a pilot account.
>
> # chown pilatl01:atlas /tmp/x509up_u460
>
> Fix the permissions.
>
> # chmod 600 /tmp/x509up_u460
>
> Switch to the pilot user.
>
> # su - pilatl01
>
> Run these commands to setup for the test.
>
> # export GLEXEC_CLIENT_CERT=/tmp/x509up_u460
> # export GLEXEC_SOURCE_PROXY=/tmp/x509up_u460
> # export X509_USER_PROXY=/tmp/x509up_u460
>
> Now do the test:
>
> # /opt/glite/sbin/glexec /usr/bin/id
> # (or in EMI etc., use /usr/sbin/glexec)
>
> If all is well, you will see something like this:
>
> uid=24683(dteam184) gid=2028(dteam) groups=2028(dteam)
and I could see entries on the Argus server logs corresponding to this
action indicating everything worked.
Is anyone able to suggest how I can reproduce the check the nagios test
is running myself? The page linked from the nagios page
(https://tomtools.cern.ch/confluence/display/SAMDOC/WN#WN-org.sam.glexec.WNgLExec)
mentions:
"OPS VO Role=pilot is used to submit the jobs."
so I suppose I can't use my own certificate to test this?
If anyone has any other suggestions of things I could try it would be
greatly appreciated.
Many thanks,
Matt
--
Matt Raso-Barnett
Linux Systems Administrator -- MPS
University of Sussex
|