On Tue, 25 Nov 2008, Dimitris Zilaskos wrote:
> Can someone can comment on this:
>
> https://gus.fzk.de/ws/ticket_info.php?ticket=44034
>
> I have tried various recipes found in the archives and in google with
> not much success.
FYI, the problem appears to have been due to setting the "max_queuable"
parameter for the biomed queue to a finite, low value.
The info system publishes MaxTotalJobs (== max_queued + max_running) for
each queue, but the RB and WMS do not take that attribute into account,
so they may send jobs that fail immediately.
By default the WMS will try up to 10 shallow resubmissions, to different
CEs if the JDL allows for that. An RB will try up to 3 deep resubmissions
by default, if allowed by the JDL.
Since the user was using at least 14 RB/WMS in parallel (normally OK)
for large numbers of jobs, the CE got hammered pretty badly.
A WMS RFE has been opened about this issue:
https://savannah.cern.ch/bugs/index.php?44599
|