Dear Asif,
Thanks a lot for the information we have already increased the value of BDII_SEARCH_TIMEOUT and now situation is lot better for us now.
Thanks again,javascript:SetCmd(cmdSend);
Send
Regards
sajjad
-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Asif Osman
Sent: Mon 11/19/2007 11:39 AM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] RM Test Failure
Dear Sajjad,
We have done some R&D on this issue and found that this problem does not solve by increasing the time out:
BDII_SEARCH_TIMEOUT=60
We kept on increasing it till 600 and even then got the same error message.
According to our findings, ldapsearch times out frequently while accessing prod-bdii.cern.ch from NCP or PAKGRID-LCG2 due to routing problem. But if the same search is made from CERN account, it completes within a fraction of second. This problem may not be faced so severly by other centers.
From PAKGRID-LCG2:
==================
[root@CE root]# time ldapsearch -x -LLL -h prod-bdii.cern.ch -p 2170 -b mds-vo-name=CERN-PROD,o=grid '(|(objectClass=GlueSchemaVersion)(objectClass=GlueTop))' > xyz
real 1m43.390s
user 0m0.770s
sys 0m0.040s
[root@CE root]# time ldapsearch -x -LLL -h prod-bdii.cern.ch -p 2170 -b mds-vo-name=CERN-PROD,o=grid '(|(objectClass=GlueSchemaVersion)(objectClass=GlueTop))' > xyz
ldap_result: Can't contact LDAP server
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
real 1m40.769s
user 0m0.310s
sys 0m0.030s
[root@CE root]# time ldapsearch -x -LLL -h prod-bdii.cern.ch -p 2170 -b mds-vo-name=CERN-PROD,o=grid '(|(objectClass=GlueSchemaVersion)(objectClass=GlueTop))' > xyz
real 2m0.739s
user 0m0.590s
sys 0m0.010s
From CERN:
=========
[lxplus223] ~/scratch0/BDIIcache > time ldapsearch -x -LLL -h prod-bdii.cern.ch -p 2170 -b mds-vo-name=CERN-PROD,o=grid '(|(objectClass=GlueSchemaVersion)(objectClass=GlueTop))' > xyz
0.052u 0.055s 0:00.72 13.8% 0+0k 0+0io 0pf+0w
We have devised a workaround for solving this problem. It is a set of two scripts, one of them runs at CERN under our cern account. The other one is running locally at our BDII server. Then we have made little modifications in the bdii-udpate script to implement this solution and it works! At least we are not getting "end-points not found" type of error messages. But we are still facing "rm" related problems.
If you are intreseted, we can provide you the workaround.
Regards,
Asif Osman
-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Sajjad Asghar
Sent: Fri 11/16/2007 8:55 AM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] RM Test Failure
Dear Maarten,
Thanks a lot for the suggestion,We will update our bdii.conf file for time out value.
Regards
Sajjad
-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Maarten Litmaath, CERN
Sent: Fri 11/16/2007 5:00 AM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] RM Test Failure
On Thu, 15 Nov 2007, Adeel-ur-Rehman wrote:
> We are having a consistent failure for the last 4-5 days in the
> CE-sft-lcg-rm-rep test stating:
>
>
> Checking replication to Central SE (lxdpm101.cern.ch)
>
>
> pcncp24.ncp.edu.pk:2170: No GlueSEName found for lxdpm101.cern.ch
> No information found for Storage Element.
> lcg_rep: Invalid argument
Your top-level BDII pcncp24.ncp.edu.pk does not manage to download
the 2.6 MB of CERN-PROD site BDII information in 30 seconds.
In /opt/bdii/var/bdii.log there are complaints:
CERN-PROD: Timed out
You can increase the timeout in /opt/bdii/etc/bdii.conf to check
if that helps:
BDII_SEARCH_TIMEOUT=60
Then restart the BDII. If the complaints disappear, we will look
into how to preserve the new value (YAIM currently hardcodes 30s).
|