Hi
You may be aware that ATLAS have a system know as the switcher which automatically drains ATLAS jobs from sites before it goes into a scheduled downtime. This has worked fairly well although in a few cases of long site downtimes it, users were caught out and "important" physics work was delayed.
Work has been in progress to improve the switcher and one additional feature was to treat long downtimes (> 24 hours) differently. At the ATLAS meeting on Tuesday the following presentation was given:
https://indico.cern.ch/getFile.py/access?contribId=10&resId=1&materialId=slides&confId=283853
A summary of the talk is that for a scheduled downtime (on the SE) lasting more than 24 hours, the site would stop receiving analysis jobs 5 days before hand and not receive any new ATLAS jobs 24 hours before hand. I feel this is excessive and could cause problems for users who would like to use their local site but are blocked because their site is going into downtime next week.
I would be interested in hearing feedback from sites. I would be interested in knowing:
- If sites pay any attention to how ATLAS drain their work before hand?
- How often you schedule a downtime longer than 24 hours and how long do these actually end up lasting?
- How you plan your downtimes? Do you factor in a drain yourself, are you cautious when declaring downtimes knowing that it is easier to end early than extend into an unscheduled downtime.
The aim should be that the ATLAS system works with the way that (the majority) of sites work, rather than sites having to work around the ATLAS system.
Thank you.
Alastair
|