Ian Fisk wrote:
> We are observing a large number of processes on the FNAL CE.
You sure they are not related to the file server problems you had
at the end of May?
> Currently there are 2300 belonging to one UID. They are roughly
> divided between
>
> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf - type
> lcgcondor -rdn jobmanager-lcgcondor -machine-type unknown - publish-jobs
>
> and
>
> /usr/local/bin/perl /opt/globus/libexec/globus-job-manager-script.pl - m
> lcgcondor -f /tmp/gram_mBjvlv -c poll
>
> Rough 1150 of each. I am not sufficiently familiar with what these
> two scripts are supposed to be doing. The number of processes does
> not appear to be growing (or shrinking). The UID in question does
> not currently have any active jobs in the batch system.
I suggest you kill almost all of them, leaving a few for us to look at.
First kill 10 processes and check if the load does not suddenly increase
a lot, then kill 50, 100, ...
|