On Fri, 18 Nov 2005, Jiri Kosina wrote:
> I can see happening something similar since last saturday, as can be
> seen on
> http://goc.grid.sinica.edu.tw/gstat/praguelcg2/GIISQuery_Usage_cpu_.html
> It improved on Tuesday when I completely killed and restarted all the
> slapd daemons on CE, but still had outages (seemed to be regular at
> 0,6,12 and 18 o'clock). During this night, without me performing any
> intervention, the outage neighter on 0,6 and 12 happened (this can also
> be seen from the graph on the URL above).
Uhm, according to
http://goc.grid.sinica.edu.tw/gstat/praguelcg2/GIISQuery_Usage_cpu_.html
it seems I have fixed it (or at least there was no BDII outage for ~12
hours, which is very good result compared to the frequency of previous
outages).
Hm, now, the way how I fixed it is quite strange and I actually don't yet
fully understand why this fixes the issue :) What I have done was removing
both setpgrp() calls from the lcg-info-generic perl script. I initially
found out that when the script is executed from midnight commander, it
hangs. I debugged it to those two calls hanging (I actually don't
understand how setpgrp() might block yet, I will look into kernel
sources). After commenting them out, the script runs smoothly from
midnight commander and also the BDII outages _seem_ to have been fixed.
--
Jiri Kosina
Institute of Physics, Academy of sciences of the Czech Republic
|