Print

Print


Hi Cal

A replacement script can be found at http://hepunx.rl.ac.uk/egee/jra1-uk/LCG/edg-rgma-restart-all
We have passed a new set of rpms to LCG which we hope will be released at the end of the month

Regards
Antony

> -----Original Message-----
> From: LHC Computer Grid - Rollout 
> [mailto:[log in to unmask]] On Behalf Of Charles Loomis
> Sent: 14 March 2005 06:41
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] rgma going mad on 2.3.1
> 
> 
> Hi Jeff,
> 
> This is indirectly caused by the rgma-servlet-monitor cron 
> entry which tries to restart the rgma daemons (and hence 
> tomcat) every so often. Unfortunately, the tomcat4 shutdown 
> script tries to exit gracefully and never succeeds in 
> stopping the processes.  This eventually exhausts the system 
> memory, ....
> 
> Eric Fede had found this sometime back and I believe 
> submitted a bug report for it.  (He can confirm this.)  In 
> the meantime, in /etc/init.d/tomcat4 you can add the hack 
> below to the stop method to ensure that the shutdown actually happens:
> 
> stop() {
>      echo -n "Stopping $TOMCAT_PROG: "
> 
>      if [ -f /var/lock/subsys/tomcat4 ] ; then
>        if [ -x /etc/rc.d/init.d/functions ]; then
>            daemon --user $TOMCAT_USER $TOMCAT_SCRIPT stop
>        else
>            su - $TOMCAT_USER -c "$TOMCAT_SCRIPT stop"
>        fi
>        RETVAL=$?
> 
>        # Hack to ensure that processes really die.
>        sleep 15
>        killall -u=tomcat4 java
>        sleep 5
> 
>        tc4run=1
>        until [ $tc4run = '0' ]
>        do
>            tc4run=`ps -aux | grep catalina | grep -v grep | 
> grep $TOMCAT_USER -c\`
>            sleep 1
>        done
>        rm -f /var/lock/subsys/tomcat4 /var/run/tomcat4.pid
>      fi
> 
>      echo
> 
>      [ $RETVAL = 0 ]
> 
> }
> 
> Perhaps this has been fixed in some official way, but I've 
> failed to notice.  If so, I'd appreciate a pointer to the 
> official fix.
> 
> Cheers.
> 
> Cal
> 
> 
> 
> Jeff Templon wrote:
> > Hi,
> >
> > we've seen our R-GMA service go nuts a few times since upgrading to 
> > 2.3.1.  I just now caught it in the act: load on the 
> machine was about 
> > 150.  Checking, there were an awful lot of 'ps' processes floating 
> > around.  Here is a snippet of the output of pstree:
> >
> >       java
> >       crond86*[crondshshedg-rgm+
> >              4*[crondshshedg-rgma+
> >              2*[crondshshedg-rgma+
> >              5*[crondshshedg-rgma+
> >              6*[crondshshedg-rgma+
> >              2*[crondshshedg-rgma+
> >              4*[crondshshedg-rgma+
> >              crondshshedg-rgma-se+
> >              crondshshedg-rgma-se+
> >              5*[crondshshedg-rgma+
> >              crondshshedg-rgma-se+
> >              2*[crondshshedg-rgma+
> >              crondshshedg-rgma-se+
> >              crondshshedg-rgma-se+
> >              crondshshedg-rgma-se+
> >              crondshshedg-rgma-se+
> >
> >
> > looks like the culprits are:
> >
> > - edg-rgma-restart
> > - edg-rmga-service-status
> > - edg-rgma-servlet-status
> > - edg-rgma-servlet-monitor
> >
> > and there are hundreds of processes running that are all trying to 
> > stop tomcat4 ...
> >
> > anybody else seeing this??
> >
> >                                 JT
> >
>