Hi Maarten,
> At CERN in /opt/globus/lib/perl/Globus/GRAM/Helper.pm we changed:
>
> my $FINAL_DONE_LOAD_RANGE = 10;
> to:
> my $FINAL_DONE_LOAD_RANGE = 1000;
>
> Otherwise jobs will not be cleaned up when the load is high, which could
> lead to an upward spiral. For example, the RB/WMS or the user may start
> canceling jobs, which cannot be cleaned up immediately, and new jobs may
> be sent instead, which also have to wait... Meanwhile the list of jobs
> gets longer, so it takes more and more time for the jobmanager to loop
> through them.
>
> Maybe this happened for that user?
>
> We intend to put this change into the next release of that code. You may
> want to apply it already.
Thanks a lot, i also noticed that cms001 lcgjm contains more than 6.2k
globus-cache-export dir that wasnt the same number we found at the batch
scheduler.
have extend the final done load from default to 1000 to help cleaning up
the job cache. thanks for the trick, not load reduce to less 10 but this
is what the parameter apply before. let's hope this able to cleanup the
cache of the job and reducing the load generated by job manager.
Br,
J
|