If it's a site wide outage, I only start draining the site once the
downtime has started - I can't do it before, as users (and the
dashboard - I'd get a ticket !) would rightly complain of not being
able to submit to a site that's allegedly up and running.
I find the Atlas proposal hilarious, but each to their own. If you do
preemptive stopping of submissions, surely the parameter to be used
should be "max queue length".
Given that we have 48 h queues, our scheduled downtimes are at least 2
days by construction if it's an invasive procedure.
Cheers,
Daniela
On 21 November 2013 14:56, Alastair Dewhurst <[log in to unmask]> wrote:
> Hi
>
> You may be aware that ATLAS have a system know as the switcher which automatically drains ATLAS jobs from sites before it goes into a scheduled downtime. This has worked fairly well although in a few cases of long site downtimes it, users were caught out and "important" physics work was delayed.
>
> Work has been in progress to improve the switcher and one additional feature was to treat long downtimes (> 24 hours) differently. At the ATLAS meeting on Tuesday the following presentation was given:
> https://indico.cern.ch/getFile.py/access?contribId=10&resId=1&materialId=slides&confId=283853
>
> A summary of the talk is that for a scheduled downtime (on the SE) lasting more than 24 hours, the site would stop receiving analysis jobs 5 days before hand and not receive any new ATLAS jobs 24 hours before hand. I feel this is excessive and could cause problems for users who would like to use their local site but are blocked because their site is going into downtime next week.
>
> I would be interested in hearing feedback from sites. I would be interested in knowing:
> - If sites pay any attention to how ATLAS drain their work before hand?
> - How often you schedule a downtime longer than 24 hours and how long do these actually end up lasting?
> - How you plan your downtimes? Do you factor in a drain yourself, are you cautious when declaring downtimes knowing that it is easier to end early than extend into an unscheduled downtime.
>
> The aim should be that the ATLAS system works with the way that (the majority) of sites work, rather than sites having to work around the ATLAS system.
>
> Thank you.
>
> Alastair
--
Sent from the pit of despair
-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/
|