Hi all,
Sorry for sending out a general plea for help, but I am really lost as
to what I should do with a particular ticket we have had open for a
while at Sussex.
Would anyone be able to give me suggestions of where to start looking
for this?
The ticket is: https://ggus.eu/ws/ticket_info.php?ticket=99198
Relating to nagios alert: https://gridppnagios.physics.ox.ac.uk/nagios/
*emi.cream.glexec.WN-gLExec-/ops/Role=pilot* is failing on :
grid-cream-01.hpc.susx.ac.uk
My predecessor Emyr, has given me a few things to look for, suggesting
that the problem could have arisen when the worker nodes and the cream
server were upgraded to EMI-3 -- that maybe the worker nodes are not
configured correctly to use argus.
I can definitely see that the worker nodes are on EMI-3, but I think the
cream server is actually still on EMI-2:
From grid-cream-01:
emi-cream-ce.x86_64 1.1.0-4.sl5
emi-release.noarch 2.0.0-1.sl5
emi-version.x86_64 2.10.5-1.el5
From one of our worker nodes:
emi-release.noarch 3.0.0-2.el6
emi-version.x86_64 3.7.0-1.el6
emi-wn.x86_64 3.0.1-1.el6
Could this be a problem?
He also said to look over this page:
https://www.gridpp.ac.uk/wiki/Argus_Server
I've checked the last section:
https://www.gridpp.ac.uk/wiki/Argus_Server#Configuring_WN_to_use_Argus_for_glexec_authorization
matches with our worker node configuration ( we have glexec-wn
installed, which seems to be the EMI3 version of emi-glexec_wn mentioned
on that page ).
Running yaim again on one of the nodes:
/opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n WN -n GLEXEC_wn
seemed to run fine apart from one error:
INFO: Generating LCMAPS config file
chown: cannot access `/etc/lcas/lcas-glexec.db': No such file or directory
chmod: cannot access `/etc/lcas/lcas-glexec.db': No such file or directory
I'm not sure what this relates to at the moment, or if it is relevant.
Is there anything else I can look for to begin debugging this problem?
Thanks,
Matt Raso-Barnett
University of Sussex
|