Dear Massimo, thank you for the quick reply.
I just checked and the user does exist in all our WNs and creams:
# dsh -w cream01,cream02 'ls /home/egee/ |grep cms201'
cream01: cms201
cream02: cms201
# dsh -w cream01,cream02 'grep cms201 /etc/passwd'
cream02: cms201:x:5201:4001:mapped user for group cms:/home/egee/cms201:/bin/bash
cream01: cms201:x:5201:4001:mapped user for group cms:/home/egee/cms201:/bin/bash
# dsh -w wn[101-110],wn[112-119],wn[144-206] 'ls /home/egee/ |grep cms201' | wc -l
81
# dsh -w wn[101-110],wn[112-119],wn[144-206] 'grep cms201 /etc/passwd' | wc -l
81
And we currently have 81 WNs in production.
Those directories are there because the upper directory is shared (Lustre), so if there was a problem with it not being mounted they would not even appear.
Any other idea??? :-S
Thx,
Miguel
On Sep 29, 2011, at 10:40 AM, Massimo Sgaravatto - INFN Padova wrote:
> On Thu, 29 Sep 2011, Massimo Sgaravatto wrote:
>
>> My guess:
>>
>> Argus, which runs on a different box wrt CREAM-CE mapped you to cms201.
>> But this user doesn't exist in the CREAM-CE and/or VOVOX
>
> Sorry: I meant WN and not VOVOX
>
>
>
>
>>
>> Cheers, Massimo
>>
>>
>> On 09/29/2011 10:24 AM, Gila Arrondo Miguel Angel wrote:
>>> Hi all,
>>> We recently discovered an issue with our CREAM setup that we don't know
>>> how to solve.
>>> Some background info: CREAM is configured to ask ARGUS for authorization
>>> and then there is a LRMS sever taking care of the queue and the scheduling.
>>> Our problem is that we are seeing that many jobs fail (last night 60%)
>>> and the cream log shows this:
>>> 29 Sep 2011 09:52:52,402 INFO
>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>> (AuthorizationHandler.java:247) - (TP-Processor116) request for
>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=169.228.130.10;
>>> USER_DN=CN=uscmspilot09/glidein-1.t2.ucsd.edu,OU=Services,DC=doegrids,DC=org;
>>> USER_FQAN={ /cms/Role=pilot/Capability=NULL;
>>> /cms/Role=NULL/Capability=NULL; /cms/TEAM/Role=NULL/Capability=NULL;
>>> /cms/uscms/Role=NULL/Capability=NULL; }; AUTHORIZED!
>>> 29 Sep 2011 09:52:52,405 INFO
>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>> (AuthorizationHandler.java:247) - (TP-Processor139) request for
>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=169.228.130.10;
>>> USER_DN=CN=uscmspilot09/glidein-1.t2.ucsd.edu,OU=Services,DC=doegrids,DC=org;
>>> USER_FQAN={ /cms/Role=pilot/Capability=NULL;
>>> /cms/Role=NULL/Capability=NULL; /cms/TEAM/Role=NULL/Capability=NULL;
>>> /cms/uscms/Role=NULL/Capability=NULL; }; AUTHORIZED!
>>> 29 Sep 2011 09:52:52,463 ERROR
>>> org.glite.ce.cream.delegationmanagement.DelegationManager
>>> (DelegationManager.java:127) - (TP-Processor139) Delegation
>>> 1317111662.194930
>>> [dn=/DC=org/DC=doegrids/OU=Services/CN=uscmspilot09/glidein-1.t2.ucsd.edu;
>>> localUser=cms201] not found!
>>> 29 Sep 2011 09:52:52,465 INFO
>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>> (AuthorizationHandler.java:247) - (TP-Processor24) request for
>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=169.228.130.10;
>>> USER_DN=CN=uscmspilot06/glidein-1.t2.ucsd.edu,OU=Services,DC=doegrids,DC=org;
>>> USER_FQAN={ /cms/Role=pilot/Capability=NULL;
>>> /cms/Role=NULL/Capability=NULL; /cms/TEAM/Role=NULL/Capability=NULL;
>>> /cms/uscms/Role=NULL/Capability=NULL; }; AUTHORIZED!
>>> 29 Sep 2011 09:52:52,466 ERROR
>>> org.glite.ce.cream.delegationmanagement.DelegationManager
>>> (DelegationManager.java:127) - (TP-Processor116) Delegation
>>> 1317111662.194930
>>> [dn=/DC=org/DC=doegrids/OU=Services/CN=uscmspilot09/glidein-1.t2.ucsd.edu;
>>> localUser=cms201] not found!
>>> 29 Sep 2011 09:52:52,467 ERROR org.glite.ce.commonj.authz.argus.ArgusPEP
>>> (ArgusPEP.java:241) - (TP-Processor84) Missing property local.user.id
>>> java.lang.IllegalArgumentException: Missing property local.user.id
>>> at org.glite.ce.commonj.authz.argus.ArgusPEP.isPermitted(ArgusPEP.java:236)
>>> at
>>> org.glite.ce.commonj.authz.AuthorizationHandler.check(AuthorizationHandler.java:245)
>>> at
>>> org.glite.ce.commonj.authz.AuthorizationHandler.invoke(AuthorizationHandler.java:306)
>>> at
>>> org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
>>> [....]
>>> at java.lang.Thread.run(Thread.java:636)
>>> 29 Sep 2011 09:52:52,468 ERROR
>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>> (AuthorizationHandler.java:308) - (TP-Processor84) Missing property
>>> local.user.id
>>> 29 Sep 2011 09:52:52,465 INFO
>>> org.glite.ce.commonj.authz.AuthorizationHandler
>>> (AuthorizationHandler.java:247) - (TP-Processor75) request for
>>> operation=JobRegister; REMOTE_REQUEST_ADDRESS=169.228.130.10;
>>> USER_DN=CN=uscmspilot49/glidein-1.t2.ucsd.edu,OU=Services,DC=doegrids,DC=org;
>>> USER_FQAN={ /cms/Role=pilot/Capability=NULL;
>>> /cms/Role=NULL/Capability=NULL; /cms/TEAM/Role=NULL/Capability=NULL;
>>> /cms/uscms/Role=NULL/Capability=NULL; }; AUTHORIZED!
>>> 29 Sep 2011 09:52:52,469 ERROR
>>> org.glite.ce.cream.delegationmanagement.DelegationManager
>>> (DelegationManager.java:127) - (TP-Processor24) Delegation
>>> 1316679082.365052
>>> [dn=/DC=org/DC=doegrids/OU=Services/CN=uscmspilot06/glidein-1.t2.ucsd.edu;
>>> localUser=cms293] not found!
>>> I have double-checked the Argus policies and they are set to permit the
>>> user to submit jobs. I also reloaded the pdp policy and cleared the pepd
>>> cache, but the problem is still here.
>>> The user cms242 is present in the CREAM and LRMS (Torque/Moab) machines,
>>> as well as in the WNs, but not in Argus.
>>> When I check the pdp and pepd status I get this:
>>> service: Argus PDP version 1.3.0
>>> start_time: 1314951714240
>>> number_of_processors: 4
>>> memory_usage: 39MB
>>> total_requests: 2131028
>>> total_completed_requests: 2131027
>>> total_request_errors: 1
>>> policy_load_instant: 1317282783479
>>> current_policy: root-default-038e764b-cbb4-45a5-a1f7-05dc86f90697
>>> current_policy_version: 1
>>> service: Argus PEP Server version 1.3.0
>>> start_time: 1314951718194
>>> number_of_processors: 4
>>> memory_usage: 64MB
>>> total_requests: 2270709
>>> total_completed_requests: 2262360
>>> total_request_errors: 8349
>>> What puzzles us is that the total_request_errors for PDP and PEPD are
>>> different. Way different!!!
>>> Any idea of where our problem might be?? By the way, this error also
>>> happens in the other cream/argus/lrms trio (same versions) that we have
>>> installed at CSCS.
>>> Thanks in advance!
>>> Miguel
>>> --
>>> Miguel Gila
>>> CSCS Swiss National Supercomputing Centre
>>> HPC Co-Location Services
>>> Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland
>>> [log in to unmask] <mailto:[log in to unmask]> | www.cscs.ch
>>> <http://www.cscs.ch> | Phone +41 91 610 82 22
>>
>>
>>
>
> \|||/
> -----------0oo----( o o )----oo0-------------------
> (_)
> INFN Sezione di Padova
> Via Marzolo, 8
> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
> Tel: ++39 0499677360 Skype: massimo.sgaravatto
> Fax: ++39 0498275952
>
>
>
>
--
Miguel Gila
CSCS Swiss National Supercomputing Centre
HPC Co-Location Services
Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland
[log in to unmask] | www.cscs.ch | Phone +41 91 610 82 22
|