what does strace -p claim they are doing??
Ian Fisk wrote:
> Hi Maarten,
>
> I left about 75 running. Memory usage of the system has
> dropped as has the load I don't think it's related to file system
> problems, which we believe we have solved with a network upgrade.
> The problem we saw before corrupted a lock file, all of which appear to
> be fine. Also in the previous problem we would see processes
> continuously increase. In this case quite suddenly there were 2300
> processes and it didn't increase. It just stayed static. So far
> they have stayed dead, which is different from the previous problem also.
>
> -Ian
>
>
> On Aug 2, 2005, at 1:17 PM, Maarten Litmaath wrote:
>
>> Ian Fisk wrote:
>>
>>
>>> We are observing a large number of processes on the FNAL CE.
>>>
>>
>> You sure they are not related to the file server problems you had
>> at the end of May?
>>
>>
>>> Currently there are 2300 belonging to one UID. They are roughly
>>> divided between
>>> globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -
>>> type lcgcondor -rdn jobmanager-lcgcondor -machine-type unknown -
>>> publish-jobs
>>> and
>>> /usr/local/bin/perl /opt/globus/libexec/globus-job-manager- script.pl
>>> - m lcgcondor -f /tmp/gram_mBjvlv -c poll
>>> Rough 1150 of each. I am not sufficiently familiar with what
>>> these two scripts are supposed to be doing. The number of
>>> processes does not appear to be growing (or shrinking). The UID
>>> in question does not currently have any active jobs in the batch
>>> system.
>>>
>>
>> I suggest you kill almost all of them, leaving a few for us to look at.
>> First kill 10 processes and check if the load does not suddenly increase
>> a lot, then kill 50, 100, ...
>>
|