Hi Cal A replacement script can be found at http://hepunx.rl.ac.uk/egee/jra1-uk/LCG/edg-rgma-restart-all We have passed a new set of rpms to LCG which we hope will be released at the end of the month Regards Antony > -----Original Message----- > From: LHC Computer Grid - Rollout > [mailto:[log in to unmask]] On Behalf Of Charles Loomis > Sent: 14 March 2005 06:41 > To: [log in to unmask] > Subject: Re: [LCG-ROLLOUT] rgma going mad on 2.3.1 > > > Hi Jeff, > > This is indirectly caused by the rgma-servlet-monitor cron > entry which tries to restart the rgma daemons (and hence > tomcat) every so often. Unfortunately, the tomcat4 shutdown > script tries to exit gracefully and never succeeds in > stopping the processes. This eventually exhausts the system > memory, .... > > Eric Fede had found this sometime back and I believe > submitted a bug report for it. (He can confirm this.) In > the meantime, in /etc/init.d/tomcat4 you can add the hack > below to the stop method to ensure that the shutdown actually happens: > > stop() { > echo -n "Stopping $TOMCAT_PROG: " > > if [ -f /var/lock/subsys/tomcat4 ] ; then > if [ -x /etc/rc.d/init.d/functions ]; then > daemon --user $TOMCAT_USER $TOMCAT_SCRIPT stop > else > su - $TOMCAT_USER -c "$TOMCAT_SCRIPT stop" > fi > RETVAL=$? > > # Hack to ensure that processes really die. > sleep 15 > killall -u=tomcat4 java > sleep 5 > > tc4run=1 > until [ $tc4run = '0' ] > do > tc4run=`ps -aux | grep catalina | grep -v grep | > grep $TOMCAT_USER -c\` > sleep 1 > done > rm -f /var/lock/subsys/tomcat4 /var/run/tomcat4.pid > fi > > echo > > [ $RETVAL = 0 ] > > } > > Perhaps this has been fixed in some official way, but I've > failed to notice. If so, I'd appreciate a pointer to the > official fix. > > Cheers. > > Cal > > > > Jeff Templon wrote: > > Hi, > > > > we've seen our R-GMA service go nuts a few times since upgrading to > > 2.3.1. I just now caught it in the act: load on the > machine was about > > 150. Checking, there were an awful lot of 'ps' processes floating > > around. Here is a snippet of the output of pstree: > > > > java > > crond86*[crondshshedg-rgm+ > > 4*[crondshshedg-rgma+ > > 2*[crondshshedg-rgma+ > > 5*[crondshshedg-rgma+ > > 6*[crondshshedg-rgma+ > > 2*[crondshshedg-rgma+ > > 4*[crondshshedg-rgma+ > > crondshshedg-rgma-se+ > > crondshshedg-rgma-se+ > > 5*[crondshshedg-rgma+ > > crondshshedg-rgma-se+ > > 2*[crondshshedg-rgma+ > > crondshshedg-rgma-se+ > > crondshshedg-rgma-se+ > > crondshshedg-rgma-se+ > > crondshshedg-rgma-se+ > > > > > > looks like the culprits are: > > > > - edg-rgma-restart > > - edg-rmga-service-status > > - edg-rgma-servlet-status > > - edg-rgma-servlet-monitor > > > > and there are hundreds of processes running that are all trying to > > stop tomcat4 ... > > > > anybody else seeing this?? > > > > JT > > >