------------------------------------------------------------------------------------
Publication from : Oliver Keeble 9443 <[log in to unmask]> (CERN)
This mail has been sent using the broadcasting tool available at http://cic.in2p3.fr
------------------------------------------------------------------------------------
Problem:
lbserver on the RBs have been hanging.
Details:
The lbserver writes insert statements to the file /tmp/rgma_statefile and sends a signal to the socket /tmp/rgma_statesock. The lcg-mon-job-status daemon listens on this socket and then reads the insert statements from the file and publishes this information into R-GMA.
There is a descriptor leak that shows up when the R-GMA server is down. This leak eventually causes the lcg-mon-job-status to crash and block the /tmp/rgma_statesock which causes the lbserver to hang.
Solution.
The lcg-mon-job-status daemon has been modified so that it no longer uses the socket, /tmp/rgma_statesock. There is still the descriptor leak so the lcg-mon-job-status daemon may eventually die if the R-GMA server is down but it should no longer cause the lbserver to hang.
Steps:
Log onto the RB
apt-get update
apt-get dist-upgrade
This should get the new rpm
lcg-mon-job-status-1.0.19-1
Remove the socket file if it is there.
rm -f /tmp/rgma_statesock
restart the daemon
/etc/rc.d/init.d/lcg-mon-job-status restart
|