Hi,
Did something else change in the CE during that time, for example glite
update? I see that for the WNs the glite update version changed.
Jean-Michel Barbet wrote:
> Hello,
>
> Since a few days, the SAM ops test of lcg-CE nanlcg01.in2p3.fr is
> failing several times a day :
> https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=nanlcg01.in2p3.fr
>
>
> This started after I reconfigured the lcg-CE, trying to switch from
> the lcgpbs to the pbs jobmanager (that is Nov 25). Due to other
> problems, I had to step back to the lcgpbs but since this try
> I have SAM tests occasionally failing for the CE.
>
> In the job submission's log there is a line saying :
>
> Reason = BrokerHelper: no compatible resources
>
> The lcg-CE is under nagios monitoring and I have graphs that show
> it is not loaded. In the log of the resource BDII on the CE
> (/opt/bdii/var/bdii.log) , I have messages like :
>
> Updating DB on port 2171
> Waiting 180 s for query results.
>
> Time for searches: 1 s
> Time to update DB: 0 s
> ldap_bind: Can't contact LDAP server (-1)
> Time to load DB: 2 s
> Grabbing port 2170 for 2171
> ==> slapadd: could not add entry dn="Mds-Vo-name=resource,o=grid"
> (line=356): txn_aborted! DB_KEYEXIST: Key/data pair already exists (-30996)
> Mon Dec 1 04:03:28 CET 2008
> Sleeping for 60
>
> But I can find these messages in logs before the problem appeared.
>
> => Does someone have a clue about what is happening ?
>
> Thank you very much.
>
> JM
>
>
--
============================================================================
Dimitris Zilaskos
GridAUTH Operations Centre @ Aristotle University of Thessaloniki , Greece
Tel: +302310998988 Fax: +302310994309
http://www.grid.auth.gr
============================================================================
|