If the condor instances for the jobs submitted by SAM portal are running
on glite CE, when new jobs coming, WMS will bypass gatekeeper and
directly submit jobs to the condor instance. For the periodical log
message in gatekeeper log or /var/log/message, I think it is that WMS
tried to launch the condor instance, but failed, then it retried again
and again.
Di
Alexander Piavka wrote:
> Hi Antun,
>
> What is more disturbing me is that on PPS site the SAM portal jobs
> are successfully executed but the only
> trace of lcas is in /var/log/gridftp-lcas_lcmaps.log
> There are no traces at /var/log/glite/gatekeeper.log & /var/log/messages
> So it looks like a security problem, but i can't undertand how this be
> happening only for jobs submited from SAM poprtal and not for all jobs,
> since it's a gatekeeper authentication which is always running and it is
> not related to https://gus.fzk.de/pages/ticket_details.php?ticket=20625
>
> Thanks
> Alex
>
> On Tue, 15 May 2007, Antun Balaz wrote:
>
>> Hi,
>>
>> We see this almost all the time, and it is a long standing problem. Since it
>> appears from time to time (it is not always there), without any changes from
>> our side, we think that it is related to some WMS problem, and not to gCE
>> problems.
>>
>> Somewhat related is the following ticket (although no mapping problems there):
>>
>> https://gus.fzk.de/pages/ticket_details.php?ticket=20625
>>
>> However, I don't know what is the status of improvements mentioned there...
>>
>> Regards, Antun
>>
>> -----
>> Antun Balaz
>> Research Assistant
>> E-mail: [log in to unmask]
>> Web: http://scl.phy.bg.ac.yu/
>>
>> Phone: +381 11 3713152
>> Fax: +381 11 3162190
>>
>> Scientific Computing Laboratory
>> Institute of Physics, Belgrade, Serbia
>> -----
>>
>> ---------- Original Message -----------
>> From: Esteban Freire Garcia <[log in to unmask]>
>> To: [log in to unmask]
>> Sent: Mon, 14 May 2007 22:50:47 +0200
>> Subject: Re: [LCG-ROLLOUT] LCAS/LCMAPS strange behaviour
>>
>>> Hi Alex,
>>>
>>> From the upgrade 29 we have a very similar incidence on PPS, similar
>>> logs..although I am not sure that the problem happen since the
>>> upgrade, in principle I didn't observe anything strange after to
>>> upgrade. What is curious, is that from the page of monitoring, the
>>> tests that are made automatically every hour has a status of Ok on
>>> PPS, however if I try to send a test from the Sam Admin�s page, this
>>> job is aborted with the following error :(reason = Got a job held
>>> event, reason: "The job attribute PeriodicHold expression 'Matched
>>> =!= TRUE && CurrentTime > QDate + 900' evaluated to TRUE" ) After
>>> reviewing all the services running, I do not observe anything
>>> strange, and I think that it is an authentication problem, although
>>> I do not observe anything stranger in this sense. So, I from here
>>> send the same question that you, Has anyone seen similar behaviour?
>>>
>>> Thanks,
>>> Esteban
>>>
>>>> Hi all,
>>>>
>>>> Both on my production & pps sites on gliteCEs i've got the following
>>>> logged exactly every 5 minutes and 30 seconds:
>>>> -----------------------------------------------------
>>>> Notice: 6: Got connection 131.154.100.148 at Sun May 13 07:08:59 2007
>>>>
>>>> Notice: 5: Trying to use delegated user proxy
>>>> Notice: 5: Authenticated globus user: /C=PL/O=GRID/O=PSNC/CN=Rafal
>>>> Lichwala - OPS Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9
>>>> Notice: 0: JOB_REPOSITORY_ID
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 (unique id used for
>>>> Job Repository) Notice: 0: FORMAT:
>>>> YYYY-MM-DD.hh:mm:ss.micros.pid.connection Notice: 0: (Format:
>>>> <date>.<time (with
>>>> microsecs)>.<pid>.<connection counter>)
>>>> Notice: 0: temporarily ALLOW empty credentials
>>>> Notice: 0: Using dlopen version of LCAS
>>>> Notice: 0: lcasmod_name = /opt/glite/lib/lcas.mod
>>>> LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> LCAS 7: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> Initialization LCAS version 1.3.1 LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas.mod-lcas_init(): Reading LCAS database /opt/glite/etc/lcas/lcas.db
>>>> LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> LCAS 5: 2007-05-13.07:09:00.123457.0000000507.0000004146 : LCAS
>>>> authorization request LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas.mod-lcas_run_va(): user is /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala -
>>>> OPS LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas_userban.mod-plugin_confirm_authorization(): checking banned users
>>>> in /opt/glite/etc/lcas/ban_users.db LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas.mod-lcas_run_va(): authorization granted by plugin
>>>> /opt/glite/lib/modules/lcas_userban.mod LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Generic
>>>> verification error for VOMS (failure)! LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin
>>>> failed LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas.mod-lcas_run_va(): authorization failed for plugin
>>>> /opt/glite/lib/modules/lcas_voms.mod LCAS 0:
>>>> 2007-05-13.07:09:00.123457.0000000507.0000004146 :
>>>> lcas.mod-lcas_run_va(): failed Failure: LCAS failed authorization.
>>>> Failure: LCAS failed authorization.
>>>> -----------------------------------------------------
>>>>
>>>> AFAIK /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS is the dn used to
>>>> submit tests from SAM Admin Portal. The connection is coming from
>>>> glite-rb-01.cnaf.infn.it WMS.
>>>> Any ideas why it tries exactly every 5::30 minutes? Does the WMS try to
>>>> monitor some previously sent jobs or what?
>>>>
>>>> What is more interesting is that then i try to submit jobs from SAM
>>>> Admin Portal
>>>> to production gliteCE the Job gets Abroted due to:
>>>> Job got an error while in the CondorG queue.
>>>> hit job shallow retry count (0)
>>>> In the job logging info i see tha the job is submited by
>>>> /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS
>>>> But nothing is logged at /var/log/glite/gatekeeper.log &
>>>> /var/log/messages regarding lcas & lcamaps authentication.
>>>> Also there is nothing in /var/log/gridftp-lcas_lcmaps.log for the user.
>>>> But the there is a mapping under /etc/grid-security/gridmapdir for the
>>>> /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS dn to ops003
>>>>
>>>> But what is even more strange is then i submit from SAM Admin Portal
>>>> to pps gliteCE, the job is sucessfully submited and executed by pbs and
>>>> blah record is insteted to /var/log/glite/accounting/blahp.log-200705 ,
>>>> but again nothing is logged both at /var/log/glite/gatekeeper.log &
>>>> /var/log/messages Howether the authentication is logged at
>>>> /var/log/gridftp-lcas_lcmaps.log
>>>>
>>>> How this can be? I've both at pps & production authentication working
>>>> ok for all other users with lcas & lcamaps messages logged as usual at
>>>> /var/log/glite/gatekeeper.log & /var/log/messages/
>>>> Any why the submition work for pps site only?
>>>>
>>>> Has anyone seen similar behaviour?
>>>>
>>>> Thanks
>>>> Alex
>> ------- End of Original Message -------
>>
|