Hi Jason,
> this is another possibility while yet have time to profile the script.
> will check this later. and i thought the problem could be related to lots
> of pending jobs submit from cms001 at the same CE to backend batch system
> that all the job manager plugins will keep query the job status and result
> in severe load of the CE? [...]
At CERN in /opt/globus/lib/perl/Globus/GRAM/Helper.pm we changed:
-----------------------------------------------------------------
my $FINAL_DONE_LOAD_RANGE = 10;
-----------------------------------------------------------------
to:
-----------------------------------------------------------------
my $FINAL_DONE_LOAD_RANGE = 1000;
-----------------------------------------------------------------
Otherwise jobs will not be cleaned up when the load is high,
which could lead to an upward spiral. For example, the RB/WMS
or the user may start canceling jobs, which cannot be cleaned up
immediately, and new jobs may be sent instead, which also have to
wait... Meanwhile the list of jobs gets longer, so it takes more
and more time for the jobmanager to loop through them.
Maybe this happened for that user?
We intend to put this change into the next release of that code.
You may want to apply it already.
|