Hi Maarten,
They didn't all have cancel. Many looked like
globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type
lcgcondor -rdn jobmanager-lcgcondor -machine-type unknown -publish-jobs
and half looked like
/usr/local/bin/perl /opt/globus/libexec/globus-job-manager-script.pl -m
lcgcondor -f /tmp/gram_mBjvlv -c poll
We didn't see a systematic condor problem today that I know of.
The batch system seems healthy. We had successful submissions for
the LCG gateway, local users, and our OSG gateway.
-Ian
On Aug 2, 2005, at 5:42 PM, Maarten Litmaath, CERN wrote:
> On Wed, 3 Aug 2005 [log in to unmask] wrote:
>
>
>> On Tue, 2 Aug 2005, Ian Fisk wrote:
>>
>>
>>> [...]
>>> globus-jo 31358 cms012 3u IPv4 17739940 TCP
>>> cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21476
>>> (CLOSE_WAIT)
>>> [...]
>>> globus-jo 31359 cms012 3u IPv4 18401793 TCP
>>> cmslcgce.fnal.gov:globus-gatekeeper->egee-rb-03.cnaf.infn.it:21741
>>> (CLOSE_WAIT)
>>>
>>
>> Is it always egee-rb-03.cnaf.infn.it?
>>
>> When did all this start? (shown by "ps auxwww | grep ^cms012")
>>
>
> I just ran the latter command myself and found almost all processes
> looking like this:
>
> /usr/local/bin/perl /opt/globus/libexec/globus-job-manager-
> script.pl \
> -m lcgcondor -f /tmp/gram_0if4LS -c cancel
>
> Were the ones you killed also of that form?
>
> It suggests that there is a problem canceling jobs: did your Condor
> system
> recently have a problem that might be related?
>
|