Hi Graeme
Is there anything interesting in /var/log/tomcat5/catalina.out
When you see that the number of CLOSE_WAIT has increased again can you
do a thread dump and send it to us
kill -3 <tomcat pid>
The output gets written to catalina.out
Thanks
Antony
> -----Original Message-----
> From: [log in to unmask]
> [mailto:[log in to unmask]] On Behalf Of
> Fisher, SM (Steve)
> Sent: 24 May 2007 14:32
> To: [log in to unmask]
> Subject: [R-GMA-SUPPORT] RE: SAME tests run in job wrapper
>
> Can somebody respond to Graeme please
>
> Steve
>
> > -----Original Message-----
> > From: Testbed Support for GridPP member institutes
> > [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
> > Sent: 24 May 2007 12:18
> > To: [log in to unmask]
> > Subject: Re: SAME tests run in job wrapper
> >
> > On 18 May 2007, at 14:25, Alastair Duncan wrote:
> >
> > > On Friday 18 May 2007 10:02:21 Graeme Stewart wrote:
> > >> Hi Steve
> > >>
> > >> The new WAR file has improved the CLOSE_WAIT a bit, but it's not
> > >> entirely gone:
> > >>
> > >> svr019:~# netstat -t | grep rgma | wc -l
> > >> 168
> > >> svr019:~# netstat -t | grep rgma | grep CLOSE_WAIT | wc -l
> > >> 39
> > >
> > > Hi Graeme,
> > >
> > > Having sockets in a CLOSE_WAIT state is not necessarily bad
> > as long
> > > as they
> > > change from this state. Do the number of sockets in the
> CLOSE_WAIT
> > > state fluctuate and not just go up?
> >
> > Oh dear, it's all gone badly wrong now:
> >
> > svr019:~# netstat -t | grep CLOSE_WAIT | wc -l
> > 1801
> >
> > All of them have 1 byte in the Recv-Q.
> >
> > And RGMA's total number of TCP connections is
> >
> > svr019:~# netstat -tp | grep java | wc -l
> > 2279
> >
> > It looks like this has been happening ~ 3 weeks. See
> >
> > http://svr031.gla.scotgrid.ac.uk/ganglia/?r=month&c=Grid
> > +Servers&h=svr019.gla.scotgrid.ac.uk
> >
> > look at proc_total.
> >
> > Pushing the big red restart button now.
> >
> >
> > >
> > >>
> > >> And in addition, the problem was monitoring in the job
> > wrapper adding
> > >> unnecessary wallclock to jobs. I don't see how this will be
> > >> dramatically improved, even if CLOSE_WAIT goes away entirely.
> > >
> > > As I understand the situation the wallclock time was
> still high when
> > > the R-GMA publishing part of the job wrapper was
> disabled. So what
> > > else is done in the Jobwapper that can cause this.
> >
> > Don't know. I think the current plan might be to test with nagios
> > instead, which is far more sane.
> >
> > Cheers
> >
> > Graeme
> >
> > --
> > Dr Graeme Stewart -
> http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
> > ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
> >
>
> _______________________________________________
> R-GMA-SUPPORT mailing list
> [log in to unmask]
> http://www.physics.gla.ac.uk/mailman/listinfo/r-gma-support
>
|