Print

Print


Hi Chris

you can check
http://panda.cern.ch/server/pandamon/query?dash=prod
http://panda.cern.ch/server/pandamon/query?dash=analysis.

Click on UK cloud.

The procedure for exclusion will be in place from tomorrow.

Cloud support will follow the problems

Cheers
Elena
On 29 Feb 2012, at 14:23, Chris Brew wrote:

> Hi Alastair,
> 
> Is there an easy place I can see whether my site is on or offline for production and/or analysis?
> 
> Any way I could query it programmatically eg with nagios?
> 
> Thanks,
> Chris. 
> 
> On 28 Feb 2012, at 12:53, "Alastair Dewhurst" <[log in to unmask]> wrote:
> 
>> Hi
>> 
>> ATLAS are introducing an automatic test for their production queues.
>> 
>> There are 5 test jobs currently running.
>> - One GEANT 4 simulation job running under three different ATLAS software release versions
>> - One Reconstruction job
>> - One Event Generation job
>> 
>> Every 30 minutes 4 booleans are calculated:
>> P1: Last three jobs from any single test have failed
>> P2: Last two jobs from any single test and the last job from another test have failed
>> P3: Last job from three separate tests have all failed
>> P4: Last two jobs from all tests have succeeded
>> 
>> If (P1 || P2 || P3) site will be blacklisted
>> If (!P1 && !P2 && !P3 && P4) site will be unblacklisted
>> 
>> Any blacklisted site will be set to 'test' mode which means normal jobs will not be submitted but test jobs will continue.
>> 
>> These test jobs have been submitted to sites since the end of last year.  The blacklisting has not been switched on yet but will be very shortly (The French Cloud was switched on today).  To view your sites jobs:
>> http://panda.cern.ch/server/pandamon/query?job=*&type=&days=1&jobsetID=any&jobStatus=&site=&cplot=yes&plot=yes&processingType=gangarobot-pft&cplot=yes&cloud=UK
>> Then click on your site.  Or you can just modify the url, Sheffield for example is:
>> http://panda.cern.ch/server/pandamon/query?job=*&type=&days=1&jobsetID=any&jobStatus=&site=&cplot=yes&plot=yes&processingType=gangarobot-pft&cplot=yes&cloud=UK&computingSite=UKI-NORTHGRID-SHEF-HEP
>> 
>> If you are experienced with using the panda monitor, the job types you are looking for are: processingType=gangarobot-pft
>> 
>> To see if your site would be blacklisted listed you can check:
>> http://hammercloud.cern.ch/hc/app/atlas/robot/incidents/?site=UKI-NORTHGRID-SHEF-HEP&severity=&q=&hours=
>> 
>> The official place where ATLAS record queue changes is still here:
>> http://panda.cern.ch/server/pandamon/query?mode=site&site=UKI-NORTHGRID-SHEF-HEP
>> 
>> 
>> 
>> In addition to this, ATLAS are also developing a method to automatically blacklist a site when it declares a (scheduled) downtime in the GOCDB.  Currently, if you declare a downtime of your SE, ATLAS will automatically blacklist your space tokens preventing transfers from their while you are down.  In development (and being tested by RAL) is a procedure that will also blacklist your site if you declare an outage on the CE.  
>> 
>> Currently, if an outage is declared we have to rely on shifters to blacklist and then test and unblacklist sites.  What will happen is that when you declare a downtime the production queues for the site will be set offline (for ATLAS) 12 hours before.  6 hours before the ANALY queues will also be set offline.  This actually works quite well as normally shorter running analysis jobs will fill the sites farm and waste as little CPU as possible before a downtime.  When we tried this at RAL there were about 230 jobs still in the farm when our downtime started and were killed.  This was less than 10% of the ATLAS jobs running 12 hours before.  
>> 
>> Once the downtime is over, the site will be set to test and the automatic test jobs will set the site back online when you are passing jobs.  (This is already the case for analysis queues at sites).  Currently in discussion is a proposal to avoid this automatic procedure for downtimes that are under a certain length and any site feedback on what should happen would be welcome.
>> 
>> 
>> Hope people find this useful.
>> 
>> Alastair

__________________________________________________
Dr Elena Korolkova
Email: [log in to unmask]
Tel.:  +44 (0)114 2223553
Fax:   +44 (0)114 2223555
Department of Physics and Astronomy
University of Sheffield
Sheffield, S3 7RH, United Kingdom