Hi Antun,
What is more disturbing me is that on PPS site the SAM portal jobs
are successfully executed but the only
trace of lcas is in /var/log/gridftp-lcas_lcmaps.log
There are no traces at /var/log/glite/gatekeeper.log & /var/log/messages
So it looks like a security problem, but i can't undertand how this be
happening only for jobs submited from SAM poprtal and not for all jobs,
since it's a gatekeeper authentication which is always running and it is
not related to https://gus.fzk.de/pages/ticket_details.php?ticket=20625
Thanks
Alex
On Tue, 15 May 2007, Antun Balaz wrote:
> Hi,
>
> We see this almost all the time, and it is a long standing problem. Since it
> appears from time to time (it is not always there), without any changes from
> our side, we think that it is related to some WMS problem, and not to gCE
> problems.
>
> Somewhat related is the following ticket (although no mapping problems there):
>
> https://gus.fzk.de/pages/ticket_details.php?ticket=20625
>
> However, I don't know what is the status of improvements mentioned there...
>
> Regards, Antun
>
> -----
> Antun Balaz
> Research Assistant
> E-mail: [log in to unmask]
> Web: http://scl.phy.bg.ac.yu/
>
> Phone: +381 11 3713152
> Fax: +381 11 3162190
>
> Scientific Computing Laboratory
> Institute of Physics, Belgrade, Serbia
> -----
>
> ---------- Original Message -----------
> From: Esteban Freire Garcia <[log in to unmask]>
> To: [log in to unmask]
> Sent: Mon, 14 May 2007 22:50:47 +0200
> Subject: Re: [LCG-ROLLOUT] LCAS/LCMAPS strange behaviour
>
> > Hi Alex,
> >
> > From the upgrade 29 we have a very similar incidence on PPS, similar
> > logs..although I am not sure that the problem happen since the
> > upgrade, in principle I didn't observe anything strange after to
> > upgrade. What is curious, is that from the page of monitoring, the
> > tests that are made automatically every hour has a status of Ok on
> > PPS, however if I try to send a test from the Sam Adminīs page, this
> > job is aborted with the following error :(reason = Got a job held
> > event, reason: "The job attribute PeriodicHold expression 'Matched
> > =!= TRUE && CurrentTime > QDate + 900' evaluated to TRUE" ) After
> > reviewing all the services running, I do not observe anything
> > strange, and I think that it is an authentication problem, although
> > I do not observe anything stranger in this sense. So, I from here
> > send the same question that you, Has anyone seen similar behaviour?
> >
> > Thanks,
> > Esteban
> >
> > > Hi all,
> > >
> > > Both on my production & pps sites on gliteCEs i've got the following
> > > logged exactly every 5 minutes and 30 seconds:
> > > -----------------------------------------------------
> > > Notice: 6: Got connection 131.154.100.148 at Sun May 13 07:08:59 2007
> > >
> > > Notice: 5: Trying to use delegated user proxy
> > > Notice: 5: Authenticated globus user: /C=PL/O=GRID/O=PSNC/CN=Rafal
> > > Lichwala - OPS Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9
> > > Notice: 0: JOB_REPOSITORY_ID
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 (unique id used for
> > > Job Repository) Notice: 0: FORMAT:
> > > YYYY-MM-DD.hh:mm:ss.micros.pid.connection Notice: 0: (Format:
> > > <date>.<time (with
> > > microsecs)>.<pid>.<connection counter>)
> > > Notice: 0: temporarily ALLOW empty credentials
> > > Notice: 0: Using dlopen version of LCAS
> > > Notice: 0: lcasmod_name = /opt/glite/lib/lcas.mod
> > > LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > LCAS 7: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > Initialization LCAS version 1.3.1 LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas.mod-lcas_init(): Reading LCAS database /opt/glite/etc/lcas/lcas.db
> > > LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > LCAS 5: 2007-05-13.07:09:00.123457.0000000507.0000004146 : LCAS
> > > authorization request LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas.mod-lcas_run_va(): user is /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala -
> > > OPS LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas_userban.mod-plugin_confirm_authorization(): checking banned users
> > > in /opt/glite/etc/lcas/ban_users.db LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas.mod-lcas_run_va(): authorization granted by plugin
> > > /opt/glite/lib/modules/lcas_userban.mod LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Generic
> > > verification error for VOMS (failure)! LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin
> > > failed LCAS 0: 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas.mod-lcas_run_va(): authorization failed for plugin
> > > /opt/glite/lib/modules/lcas_voms.mod LCAS 0:
> > > 2007-05-13.07:09:00.123457.0000000507.0000004146 :
> > > lcas.mod-lcas_run_va(): failed Failure: LCAS failed authorization.
> > > Failure: LCAS failed authorization.
> > > -----------------------------------------------------
> > >
> > > AFAIK /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS is the dn used to
> > > submit tests from SAM Admin Portal. The connection is coming from
> > > glite-rb-01.cnaf.infn.it WMS.
> > > Any ideas why it tries exactly every 5::30 minutes? Does the WMS try to
> > > monitor some previously sent jobs or what?
> > >
> > > What is more interesting is that then i try to submit jobs from SAM
> > > Admin Portal
> > > to production gliteCE the Job gets Abroted due to:
> > > Job got an error while in the CondorG queue.
> > > hit job shallow retry count (0)
> > > In the job logging info i see tha the job is submited by
> > > /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS
> > > But nothing is logged at /var/log/glite/gatekeeper.log &
> > > /var/log/messages regarding lcas & lcamaps authentication.
> > > Also there is nothing in /var/log/gridftp-lcas_lcmaps.log for the user.
> > > But the there is a mapping under /etc/grid-security/gridmapdir for the
> > > /C=PL/O=GRID/O=PSNC/CN=Rafal Lichwala - OPS dn to ops003
> > >
> > > But what is even more strange is then i submit from SAM Admin Portal
> > > to pps gliteCE, the job is sucessfully submited and executed by pbs and
> > > blah record is insteted to /var/log/glite/accounting/blahp.log-200705 ,
> > > but again nothing is logged both at /var/log/glite/gatekeeper.log &
> > > /var/log/messages Howether the authentication is logged at
> > > /var/log/gridftp-lcas_lcmaps.log
> > >
> > > How this can be? I've both at pps & production authentication working
> > > ok for all other users with lcas & lcamaps messages logged as usual at
> > > /var/log/glite/gatekeeper.log & /var/log/messages/
> > > Any why the submition work for pps site only?
> > >
> > > Has anyone seen similar behaviour?
> > >
> > > Thanks
> > > Alex
> ------- End of Original Message -------
>
|