Hi Maarten,
I left about 75 running. Memory usage of the system has
dropped as has the load I don't think it's related to file system
problems, which we believe we have solved with a network upgrade.
The problem we saw before corrupted a lock file, all of which appear
to be fine. Also in the previous problem we would see processes
continuously increase. In this case quite suddenly there were 2300
processes and it didn't increase. It just stayed static. So
far they have stayed dead, which is different from the previous
problem also.
-Ian
On Aug 2, 2005, at 1:17 PM, Maarten Litmaath wrote:
> Ian Fisk wrote:
>
>
>> We are observing a large number of processes on the FNAL CE.
>>
>
> You sure they are not related to the file server problems you had
> at the end of May?
>
>
>> Currently there are 2300 belonging to one UID. They are roughly
>> divided between
>> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf
>> - type lcgcondor -rdn jobmanager-lcgcondor -machine-type unknown -
>> publish-jobs
>> and
>> /usr/local/bin/perl /opt/globus/libexec/globus-job-manager-
>> script.pl - m lcgcondor -f /tmp/gram_mBjvlv -c poll
>> Rough 1150 of each. I am not sufficiently familiar with what
>> these two scripts are supposed to be doing. The number of
>> processes does not appear to be growing (or shrinking). The
>> UID in question does not currently have any active jobs in the
>> batch system.
>>
>
> I suggest you kill almost all of them, leaving a few for us to look
> at.
> First kill 10 processes and check if the load does not suddenly
> increase
> a lot, then kill 50, 100, ...
>
|