Hello,
Since a few days, the SAM ops test of lcg-CE nanlcg01.in2p3.fr is
failing several times a day :
https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CE&vo=ops&nodename=nanlcg01.in2p3.fr
This started after I reconfigured the lcg-CE, trying to switch from
the lcgpbs to the pbs jobmanager (that is Nov 25). Due to other
problems, I had to step back to the lcgpbs but since this try
I have SAM tests occasionally failing for the CE.
In the job submission's log there is a line saying :
Reason = BrokerHelper: no compatible resources
The lcg-CE is under nagios monitoring and I have graphs that show
it is not loaded. In the log of the resource BDII on the CE
(/opt/bdii/var/bdii.log) , I have messages like :
Updating DB on port 2171
Waiting 180 s for query results.
Time for searches: 1 s
Time to update DB: 0 s
ldap_bind: Can't contact LDAP server (-1)
Time to load DB: 2 s
Grabbing port 2170 for 2171
==> slapadd: could not add entry dn="Mds-Vo-name=resource,o=grid"
(line=356): txn_aborted! DB_KEYEXIST: Key/data pair already exists (-30996)
Mon Dec 1 04:03:28 CET 2008
Sleeping for 60
But I can find these messages in logs before the problem appeared.
=> Does someone have a clue about what is happening ?
Thank you very much.
JM
--
------------------------------------------------------------------------
Jean-michel BARBET | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: [log in to unmask]
------------------------------------------------------------------------
|