On Wed, Nov 10, 2004 at 03:54:35PM +0300, Lev Shamardin wrote:
> Hi all,
>
> We have some strange problems with a GRIS running on CE. It started to die
> without any reasons any time it wants. It happend 3 times during the last
> week. Nothing was changed in the site config during this period, but it
> stopped working. The log file says somthing like this:
>
> Mon Nov 8 11:48:05 MSK 2004 grid-info-soft-register [9297]: log: daemon PID=9385 terminated, exiting
> Tue Nov 9 12:26:37 MSK 2004 grid-info-soft-register [9294]: log: started daemon PID=9370 "/opt/globus/libexec/slapd"
> Tue Nov 9 12:26:37 MSK 2004 grid-info-soft-register [9294]: log: started slave PIDs 9399 9405
> Tue Nov 9 12:26:37 MSK 2004 grid-info-soft-register [9399]: log: slave running on 120 interval
> Tue Nov 9 12:26:37 MSK 2004 grid-info-soft-register [9405]: log: slave running on 120 interval
> Tue Nov 9 20:26:31 MSK 2004 grid-info-soft-register [9294]: log: daemon PID=9370 terminated, exiting
>
> Any ideas what may be the reason and how to fix it?
This might be cause by the "socket leak" in slapd.
If you keep an eye on it and periodically check how many fd's it has
open and they after a while starts to climb to +100's then thats your
problem.
This might possibly be fixed by setting
idletimeout 600
in grid-info-slapd.conf, it will then start doing timeout on idle connections.
(The time is in seconds)
I haven't verified that this fixes the problem since we don't have it
right now.
--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: [log in to unmask] Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
|