I have a strange problem. The FNAL site is being thrashed by
hundreds of copies of /tmp/grid_manager_monitor_agent that run on the
gateway, spawned by the fork queue. Each instance takes 14M of
memory and before long all the system memory is used. They are all
from the same user, who submitted a lot of jobs a few days ago, but
killed them with edg-job-cancel. What is particularly strange is
that I killed 700 of them this afternoon. After 6 hours there were
more than 200 running again.
At the moment I have to monitor manually. Any thoughts of the cause
or the solution would be appreciated.
THanks, Ian
|