Jiri Kosina wrote:
> On Fri, 18 Nov 2005, Jiri Kosina wrote:
>
>> I can see happening something similar since last saturday, as can be
>> seen on
>> http://goc.grid.sinica.edu.tw/gstat/praguelcg2/GIISQuery_Usage_cpu_.html
>> It improved on Tuesday when I completely killed and restarted all the
>> slapd daemons on CE, but still had outages (seemed to be regular at
>> 0,6,12 and 18 o'clock). During this night, without me performing any
>> intervention, the outage neighter on 0,6 and 12 happened (this can
>> also be seen from the graph on the URL above).
>
>
> Uhm, according to
> http://goc.grid.sinica.edu.tw/gstat/praguelcg2/GIISQuery_Usage_cpu_.html
> it seems I have fixed it (or at least there was no BDII outage for ~12
> hours, which is very good result compared to the frequency of previous
> outages).
>
> Hm, now, the way how I fixed it is quite strange and I actually don't
> yet fully understand why this fixes the issue :) What I have done was
> removing both setpgrp() calls from the lcg-info-generic perl script. I
> initially found out that when the script is executed from midnight
> commander, it hangs. I debugged it to those two calls hanging (I
> actually don't understand how setpgrp() might block yet, I will look
> into kernel sources). After commenting them out, the script runs
> smoothly from midnight commander and also the BDII outages _seem_ to
> have been fixed.
Are you sure the setpgrp() calls themselves were hanging, or was it
rather an I/O involving the terminal afterwards?
If the process group is not the same as that of the terminal, and if
you have done something like "stty tostop" (check with "stty -a"),
the background process will hang on an I/O with the terminal.
Now, since lcg-info-generic is called by the MDS GRIS, it suggests
that /etc/init.d/globus-mds is not careful enough to dissociate the
slapd from the terminal. You can open a bug for that.
|