Hi,
Having a look at the bdii.log files ( /opt/bdii/var/bdii.log and similar)
of your pcncp24.ncp.edu.pk
you can see if the reason is slow network and thus failure to
download information from CERN-PROD about the LFC.
Just grep for
CERN-PROD
If you see lots of
CERN-PROD: ldap_bind: Can't contact LDAP server
or something like that.
you should try and modify:
/opt/bdii/etc/bdii.conf
by increasing
BDII_SEARCH_TIMEOUT=60
to something like 120 or even more.
Hope this helps,
Emanouil Atanassov
[log in to unmask]
Michel Jouvin wrote:
> Adeel,
>
> Failure on replica management tests, when not related to local problems,
> are generally related to top level BDII problems. Clearly there seems to
> be an error in your case.
>
> Michel
>
> --On vendredi 30 mars 2007 11:27 +0500 Adeel-ur-Rehman
> <[log in to unmask]> wrote:
>
>> Hi All,
>>
>> We are facing following critical problems at our site for a long time:
>>
>> 1) Most of the jobs running at our site fails while performing Replica
>> Management Tests. The error returned is: LFC endpoint not found
>> LFC endpoint not found
>> lcg_cr: Invalid argument
>>
>> I found some help regarding it on the web where it was suggested that to
>> resolve this error, one must set the LFC_HOST variable. (e.g., export
>> LFC_HOST=prod-lfc-shared-central.cern.ch) but we are not using any LFC at
>> our side.
>>
>> Any Idea about this issue?
>>
>>
>>
>> 2) Related to our top level BDII:
>>
>> pcncp24.ncp.edu.pk: could not be queried
>> check for missing attributes in bdii:
>> GlueSEUniqueID: lxn1183.cern.ch
>> GlueSEName: CERN-PROD:disk
>> GlueSARoot: ops:ops
>>
>> The recommended query for testing it is:
>>
>> ldapsearch -xLLL -l 15 -h bdiihostname -p 2170 -b
>> 'GlueSEUniqueID=lxn1183.cern.ch,mds-vo-
>> name=CERN-PROD,mds-vo-name=local,o=grid'
>> '(|(GlueSEUniqueID=lxn1183.cern.ch)(objectclass=GlueSA))' GlueSEUniqueID
>> GlueSEName GlueSARoot
>>
>>
>> Should I run the above query as it is? as I attempted it several times,
>> but it returns: ldap_bind: Can't contact LDAP server
>>
>> if I mention bdii-host-name-value in place of bdiihostname, i get:
>>
>> No such object (32)
>> Matched DN: mds-vo-name=local,o=grid
>>
>>
>> Are we missing something in it?
>>
>>
>>
>> 3) Now-a-days, almost all of our jobs run on the same Worker Node.
>> However, all of our 14 Worker Nodes have same H/W specs. No extra disk
>> usage problems are there. Moreover, "pbsnodes -a" command gives status of
>> every other node as "free".
>>
>> But sometimes, we found that the jobs running on the single Worker Node
>> seem to consume almost 100% CPU usage.
>>
>> If we power off that WN, almost no jobs come into the queue for
>> execution; if reached to the queue, it appears to be in the wait state.
>>
>> As far as we noticed, there seems to be no differences in the
>> configuration files in that WN as compared to any other WN at our site.
>>
>>
>>
>> Any solutions are welcome!!!
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Adeel
>>
>>
>>
>>
>>
>>
>>
>> Adeel-ur-Rehman
>> Scientific Officer,
>> Advanced Scientific Computing
>> National Centre for Physics,
>> Quaid-i-Azam University Campus,
>> Islamabad.
>> Email: [log in to unmask] <mailto:[log in to unmask]>
>> Tel: (+92-51) 2601018
>> Fax: (+92-51) 9205753
>
>
>
> *************************************************************
> * Michel Jouvin Email : [log in to unmask] *
> * LAL / CNRS Tel : +33 1 64468932 *
> * B.P. 34 Fax : +33 1 69079404 *
> * 91898 Orsay Cedex *
> * France *
> *************************************************************
>
|