Hi Stephen
On 15 Jul 2014, at 16:16, Stephen Burke <[log in to unmask]> wrote:
> Stephen Jones [mailto:[log in to unmask]] said:
>> Minutes added to: https://indico.cern.ch/event/330837
>
> " Jeremy: Why not flush the BDII cache.
> Kashif: Manual restart does that."
>
> I'm not sure when or by whom the cache would be flushed ... the cache time is configurable, but the whole point of having a longer time was because people used to complain about things vanishing if there was a short downtime.
The question I asked is if it was possible to somehow flush the cache in circumstances like this where stale information was causing infrastructure errors (and with it extra work). I remember why the caching was introduced. It sounds like the Nagios tests could be improved to workaround the issue.
> Cached objects are marked by having a status of "Unknown" so you can tell if you want to. Anyway it seems to me that tests should have some kind of timeout, the message broker could be unresponsive even if not officially down.
Sure.
>
> "Stephen Burke suggests adding a requirement like TotalJobs < MaxTotalJobs."
>
> No, I said that there *is* such a requirement - it's in the WMS itself so not easily optional, for better or worse (but I don't remember anyone complaining about it).
I’m not sure that was clear from the email which said "Actually someone did do something several years ago, namely adding a requirement like TotalJobs < MaxTotalJobs. However it looks like Liverpool hasn't set a limit, it publishes the 999999999 default.”
So something needs to be set on the set AND on the WMS.
If there is no straightforward fix then the user behind the large job submissions will likely get more widely banned.
Jeremy
|