Hi
Sorry it took so long to answer to this! About the
lcg-info-dynamic-scheduler, it should not be doing this, on the other
hand the syntax being used for the FQANs is not a syntax that has been
tested. i tested the new-style VOMS syntax
/atlas/Role=production
and not the old style
/VO=atlas/GROUP=/atlas/ROLE=production
I can't figure out why it would matter, but maybe it's done something
terrible to the parsing logic.
I can try and diagnose the problem for you if you send me the following:
the static ldif file for the CE
the lcg-info-dynamic-scheduler.conf file you are using (both the
"good" one and the one causing problems)
a copy of lrmsinfo-pbs command output
Performance of this version of the program should be *better* than the
old one, so there must be some weird interaction going on somewhere. It
might be the bizarre untested FQAN syntax, or it might be something
about how your site is different than others.
JT
ps. also it would be useful to see the results of
time /opt/lcg/libexec/lrmsinfo-pbs > t.o
then make a modified version of the conf file which does 'cat t.o' as
the lrms backend command, then
time /opt/lcg/libexec/lcg-info-dynamic-scheduler -c myconf
Antun Balaz wrote:
> Hi,
>
> Seems that I found the solution. There are three items I would like to pint
> out to:
>
> 1) Adding simple
>
> index objectclass eq
> index GlueVOViewLocalID eq
>
> to /etc/init.d/bdii in ldbm database definitions section solved the problem on
> our BDII_site. I suppose that some indexing should be added to BDII (both site
> and top-level) by YAIM, since otherwise there will be problems of this type on
> other sites as well. Earlier I had some indexing which was overridden by the
> latest reconfiguration. Relevant link from SEE-GRID Wiki, created by Valentin
> Vidic:
>
> http://wiki.egee-see.org/index.php/Fixing_BDII_response_time
>
> This is the recommended method to find out what are the attributes BDII should
> be indexed on.
>
> 2) Another problem I discovered is that the latest reconfiguration created
> static-file-site.ldif and static-file-Site.ldif in /opt/lcg/var/gip/ldif. In
> turn, this causes errors like this one in /opt/bdii/var/bdii.log:
>
> Error for dn: GlueSiteUniqueID=AEGIS01-PHY-SCL,mds-vo-name=AEGIS01-PHY-SCL,o=grid
> ==> slapadd: could not add entry
> dn="GlueSiteUniqueID=AEGIS01-PHY-SCL,mds-vo-name=AEGIS01-PHY-SCL,o=grid"
> (line=2791)
>
> These are gone if static-file-site.ldif is removed, as it should be. This is
> YAIM's problem, isn't it?
>
> 3) Still unknown to me is should /opt/lcg/etc/lcg-info-dynamic-scheduler.conf
> contain just VOMS part for each VO, e.g. for ATLAS:
>
> atlassgm:/VO=atlas/GROUP=/atlas/ROLE=lcgadmin
> atlasprd:/VO=atlas/GROUP=/atlas/ROLE=production
>
> Or also the following should be added for each VO (the example is for ATLAS VO):
>
> atlas:atlas
>
> Thanks, Antun
>
> -----
> Antun Balaz
> Research Assistant
> E-mail: [log in to unmask]
> Web: http://scl.phy.bg.ac.yu/
>
> Phone: +381 11 3160260, Ext. 152
> Fax: +381 11 3162190
>
> Scientific Computing Laboratory
> Institute of Physics, Belgrade, Serbia
> -----
>
> ---------- Original Message -----------
> From: "Maarten Litmaath, CERN" <[log in to unmask]>
> To: [log in to unmask]
> Sent: Sat, 21 Apr 2007 22:50:17 +0200
> Subject: Re: [LCG-ROLLOUT] Downtime due to BDII_site problems
>
>> On Sat, 21 Apr 2007, Antun Balaz wrote:
>>
>>> After we applied update 21 to gLite and reconfigured lcg-CE_torque at
>>> AEGIS01-PHY-SCL, we experience substantial problems with site BDII stability.
>>> We suspect that the problem is related to
>>> lcg-info-dynamic-scheduler-generic-2.1.0-1.noarch which was installed with the
>>> latest update. However, problems appeared after we reconfigured lcg-CE, not
>>> before this. GGUS tiket:
>>>
>>> https://gus.fzk.de/pages/ticket_details.php?ticket=21050
>>>
>>> Any help in resolving this would be appreciated. We experimented a little bit
>>> with changing various timeouts, and for some time it looked like the problem
>>> is solved, but later it re-appeared. This is what we tried to change:
>>>
>>> in /opt/bdii/etc/bdii.conf:
>>> BDII_SEARCH_TIMEOUT=120 (from 30 to 120)
>>> BDII_BREATHE_TIME=60 (from 30 to 60)
>>>
>>> in /opt/lcg/etc/lcg-info-generic.conf:
>>> freshness = 60 (from 30 to 120)
>>> cache_ttl = 300
>>> response = 110 (from 30 to 120)
>>> timeout = 150
>> Did you have to increase those timeouts manually?
>> It should have been done by config_gip in glite-yaim-3.0.1-9:
>> do you have your own config_gip in the local subdirectory?
> ------- End of Original Message -------
|