Print

Print


On Wed, 3 Aug 2005 [log in to unmask] wrote:

> On Tue, 2 Aug 2005, Ian Fisk wrote:
> 
> > [...]
> > globus-jo 31358 cms012    3u  IPv4 17739940                 TCP  
> > cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21476  
> > (CLOSE_WAIT)
> > [...]
> > globus-jo 31359 cms012    3u  IPv4 18401793                TCP  
> > cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21741  
> > (CLOSE_WAIT)
> 
> Is it always egee-rb-03.cnaf.infn.it?
> 
> When did all this start?  (shown by "ps auxwww | grep ^cms012")

I just ran the latter command myself and found almost all processes
looking like this:

    /usr/local/bin/perl /opt/globus/libexec/globus-job-manager-script.pl \
	-m lcgcondor -f /tmp/gram_0if4LS -c cancel

Were the ones you killed also of that form?

It suggests that there is a problem canceling jobs: did your Condor system
recently have a problem that might be related?