Hi Matt,
Nagios (https://gridppnagios.physics.ox.ac.uk/nagios/) is reporting
error 201, which translates as "client error". Hence the problem is more
likely to be on the WN, not on the CE or ARGUS server.
The NULLs are fine - they just mean that nothing is specified in that
field. I see an occasional rogue "NOT AUTHORIZED" as well, so I don't
think they are relevant to your problem.
.pilops appears to be correct for your site from the info below - it
authorizes any local user starting with "pilops" in the username to use
glexec.
I see the same empty log files in /var/log/glexec (actually, I only have
lcas_lcmaps.log files in that directory).
Cheers,
John
On 09/01/2014 15:10, Matt RB wrote:
> Thanks John, Daniela, for your help,
>
> On 09/01/14 14:47, John Hill wrote:
>> On my WNs (I'm using glexec with ARGUS) /etc/lcmaps/lcmaps-glexec.db
>> contains the details of the ARGUS server. I have no file
>> /etc/lcas/lcas-glexec.db, so I don't think its absence matters. When I
>> had a similar problem last October I found that the
>> /var/log/cream/glite-ce-cream.log file on the CREAM CE did point me in
>> the right direction (I could see NOT AUTHORIZED entries specifically for
>> ops, which indicated a problem on the ARGUS server). Hence it might be
>> worth checking the CE logs.
>
> I've had a look in here and there are a lot of AUTHORIZED messages
> predominantly, eg a recent message today:
>
> 09 Jan 2014 14:34:43,656 INFO
> org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor -
> ID=559127; NAME="JOB_START"; PRIORITY_LEVEL=1; IS_ASYNCHRONOUS=true;
> STATUS=EXECUTING; CATEGORY="JOB_MANAGEMENT";
> USER_ID="CN_Robot_GridClient_CN_kashif_mohammad_L_OeSC_OU_Oxford_O_eScience_C_UK_ops_Role_pilot_Capability_NULL";
> CREATION_TIME="Thu Jan 09 14:34:40 GMT 2014"; START_PROCESSING_TIME="Thu
> Jan 09 14:34:40 GMT 2014"; JOB_ID_LIST="CREAM943803973";
> IS_ADMIN="false"; REMOTE_REQUEST_ADDRESS="146.179.246.244";
> LOCAL_USER="pilops02"; USER_DN="CN=Robot:GridClient,CN=kashif
> mohammad,L=OeSC,OU=Oxford,O=eScience,C=UK"; LOCAL_USER_GROUP="pilops";
> USER_FQAN={ /ops/Role=pilot/Capability=NULL;
> /ops/NGI/Role=NULL/Capability=NULL;
> /ops/NGI/UK/Role=NULL/Capability=NULL; /ops/Role=NULL/Capability=NULL }
> lrmsAbsJobId=sge/20140109143441/9587196;
>
>
> grepping over the past few days of these logs for NOT AUTHORIZED gave me
> mostly one specific person and then one on 31st Dec:
>
> glite-ce-cream.log.8:31 Dec 2013 17:28:49,992 INFO
> org.glite.ce.commonj.authz.axis2.AuthorizationHandler - request for
> OPERATION={http://glite.org/2007/11/ce/cream/types}QueryEvent;
> REMOTE_REQUEST_ADDRESS=130.246.180.148;
> USER_DN=CN=Robot:GridClient,CN=kashif
> mohammad,L=OeSC,OU=Oxford,O=eScience,C=UK; USER_FQAN={
> /ops/Role=lcgadmin/Capability=NULL; /ops/NGI/Role=NULL/Capability=NULL;
> /ops/NGI/UK/Role=NULL/Capability=NULL; /ops/Role=NULL/Capability=NULL;
> }; NOT AUTHORIZED
>
> I don't know whether the NULL(s) in there are a good sign or not?
>
>> Does /etc/glexec.conf on the WNs look sensible? In particular, does
>> "user_white_list" include the pilot ops pool account?
>
> That line for our nodes looks like:
> user_white_list = .pilops,.pildteam,.pilatl,.pilsno
>
> so hopefully pilops is the correct one here?
>
> Lastly, /var/log/glexec/{glexec_log,lcas_lcmaps.log} are all empty on
> the WNs -- again not sure if that is significant?
>
> Matt
>
>> Cheers,
>> John
>>
>> On 09/01/2014 13:52, Daniela Bauer wrote:
>>> Hi Matt,
>>>
>>> We only run a pseudo glexec around here, but to answer some of the
>>> questions:
>>> The test seems to fail on the WN. What's on the CE should be pretty
>>> much irrelevant.
>>>
>>> The initial release on glexec in EMI3 mentions yaim accidentally being
>>> dropped from the package:
>>> http://www.eu-emi.eu/releases/emi-3-montebianco/products/-/asset_publisher/5dKm/content/glexec-wn-1
>>>
>>>
>>> Could you check if yours is up to date ?
>>>
>>> Last time I had to deal with glexec (admittedly a while)
>>> /etc/lcas/lcas-glexec.db specified which argus server to use, so
>>> lacking that would be quite an oversight.
>>>
>>> You can crank up the debug statements on yaim with -d6.
>>>
>>> Hope that helps.
>>>
>>> Cheers,
>>> Daniela
>>>
>>> On 9 January 2014 13:31, Matt RB <[log in to unmask]> wrote:
>>>> Hi all,
>>>> Sorry for sending out a general plea for help, but I am really lost
>>>> as to
>>>> what I should do with a particular ticket we have had open for a
>>>> while at
>>>> Sussex.
>>>>
>>>> Would anyone be able to give me suggestions of where to start looking
>>>> for
>>>> this?
>>>>
>>>> The ticket is: https://ggus.eu/ws/ticket_info.php?ticket=99198
>>>>
>>>> Relating to nagios alert: https://gridppnagios.physics.ox.ac.uk/nagios/
>>>>
>>>> *emi.cream.glexec.WN-gLExec-/ops/Role=pilot* is failing on :
>>>> grid-cream-01.hpc.susx.ac.uk
>>>>
>>>> My predecessor Emyr, has given me a few things to look for,
>>>> suggesting that
>>>> the problem could have arisen when the worker nodes and the cream
>>>> server
>>>> were upgraded to EMI-3 -- that maybe the worker nodes are not
>>>> configured
>>>> correctly to use argus.
>>>>
>>>> I can definitely see that the worker nodes are on EMI-3, but I think
>>>> the
>>>> cream server is actually still on EMI-2:
>>>>
>>>> From grid-cream-01:
>>>> emi-cream-ce.x86_64 1.1.0-4.sl5
>>>> emi-release.noarch 2.0.0-1.sl5
>>>> emi-version.x86_64 2.10.5-1.el5
>>>>
>>>> From one of our worker nodes:
>>>> emi-release.noarch 3.0.0-2.el6
>>>> emi-version.x86_64 3.7.0-1.el6
>>>> emi-wn.x86_64 3.0.1-1.el6
>>>>
>>>> Could this be a problem?
>>>>
>>>> He also said to look over this page:
>>>> https://www.gridpp.ac.uk/wiki/Argus_Server
>>>>
>>>> I've checked the last section:
>>>> https://www.gridpp.ac.uk/wiki/Argus_Server#Configuring_WN_to_use_Argus_for_glexec_authorization
>>>>
>>>>
>>>> matches with our worker node configuration ( we have glexec-wn
>>>> installed,
>>>> which seems to be the EMI3 version of emi-glexec_wn mentioned on that
>>>> page
>>>> ).
>>>>
>>>> Running yaim again on one of the nodes:
>>>> /opt/glite/yaim/bin/yaim -c -s /etc/yaim/site-info.def -n WN -n
>>>> GLEXEC_wn
>>>>
>>>> seemed to run fine apart from one error:
>>>> INFO: Generating LCMAPS config file
>>>> chown: cannot access `/etc/lcas/lcas-glexec.db': No such file or
>>>> directory
>>>> chmod: cannot access `/etc/lcas/lcas-glexec.db': No such file or
>>>> directory
>>>>
>>>> I'm not sure what this relates to at the moment, or if it is relevant.
>>>>
>>>> Is there anything else I can look for to begin debugging this problem?
>>>>
>>>> Thanks,
>>>>
>>>> Matt Raso-Barnett
>>>> University of Sussex
>>>
>>>
>>>
|