On Wed, 3 Aug 2005 [log in to unmask] wrote:
> On Tue, 2 Aug 2005, Ian Fisk wrote:
>
> > [...]
> > globus-jo 31358 cms012 3u IPv4 17739940 TCP
> > cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21476
> > (CLOSE_WAIT)
> > [...]
> > globus-jo 31359 cms012 3u IPv4 18401793 TCP
> > cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21741
> > (CLOSE_WAIT)
>
> Is it always egee-rb-03.cnaf.infn.it?
>
> When did all this start? (shown by "ps auxwww | grep ^cms012")
I just ran the latter command myself and found almost all processes
looking like this:
/usr/local/bin/perl /opt/globus/libexec/globus-job-manager-script.pl \
-m lcgcondor -f /tmp/gram_0if4LS -c cancel
Were the ones you killed also of that form?
It suggests that there is a problem canceling jobs: did your Condor system
recently have a problem that might be related?
|