> > in /opt/bdii/etc/bdii.conf:
> > BDII_SEARCH_TIMEOUT=120 (from 30 to 120)
> > BDII_BREATHE_TIME=60 (from 30 to 60)
> >
> > in /opt/lcg/etc/lcg-info-generic.conf:
> > freshness = 60 (from 30 to 120)
> > cache_ttl = 300
> > response = 110 (from 30 to 120)
> > timeout = 150
>
> Did you have to increase those timeouts manually?
> It should have been done by config_gip in glite-yaim-3.0.1-9:
> do you have your own config_gip in the local subdirectory?
Hi Maarten,
Yes, they were increased automatically, and I changed them later to other
values to see if this would help, but since it didn't, I put them back to the
YAIM-configured values.
Hi Maarten,
The machine where this happens is double Xeon 2.8 GHz with hyperthreading
enabled and 2 GB of RAM. Swap is not used at all times, so this is not memory
issue.
Monitoring of processes through top gave interesting results: two slapd
process owned by edginfo (initiated by globus-mds, local info system on port
2135) occupied 20%-25% of the overall CPU time, evenly distributed over all 4
logical CPUs, so that each of CPUs was occupied at approx. the same percentage
by these processes. Other processes were using approx. 10% of CPU time.
What is interesting is that, when I stopped globus-mds, two slapd processes
remained, and I have to kill them manually. After that CPU usage dropped to
approx. 10% evenly distributed over CPUs, but there were excessive amounts of
bdii-fwd processes: approx. 180 accumulated in say 5 minutes after restarting
bdii (with globus-mds still down).
After starting globus-mds again there was just one slapd process associated
with it. I restarted bdii to get rid of accumulated bdii-fwd processes, but
they appeared again (up to 180, which drops sometimes to 100 or so). The CPU
usage is now not higher than 10%-20%, but the load is the same as before, about 8.
So, the problem seems to be related to the slow response from local info
system and from site bdii, which accumulates queries they have to answer, and
this slows them down even further. It takes around 120 seconds for me locally
to get answer from site BDII on port 2170, and around the same time for the
answer from the local info system on port 2135.
Any help will be appreciated.
Thanks, Antun
|