Print

Print


Hello

A while ago ATLAS introduced automatic functional tests for both their production and analysis queues.  If a site starting failing these tests, the queues would be set offline and an email sent to the ATLAS cloud support mail list.  Now obviously not everyone wants to be on this ATLAS mail list so I requested that emails could be sent out for specific sites.  This is now possible so if you would like to be emailed when your site has been put offline can you please send me the email and the sites you want to be contacted about.  eg., [log in to unmask]<mailto:[log in to unmask]> for any RAL-LCG2 queue.  At the bottom of this message I have attached an example of the mail you will receive.  Just as a warning, this feature has just been added, so there may be some teething problems.

I also remember that some site admins wondered if there was a scriptable way of checking if their site was online.  This is also possible, but not in a (well) documented state at the moment.  However if you would like something along these lines and can give me some examples of use cases I can write a script that does what you need.

A very basic example:
source /afs/cern.ch/atlas/offline/external/GRID/DA/panda-client/latest/etc/panda/panda_setup.sh
python
> from pandatools import Client
> status = Client.PandaSites['ANALY_RAL']['status']
> print status
You can get all the information this way that you can view from the panda monitor.

Of course, you can get most of the information you might need from checking the panda monitor:
http://panda.cern.ch/server/pandamon/query?dash=prod
http://panda.cern.ch/server/pandamon/query?dash=analysis
or the site status board:
http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview#currentView=Shifter+view&find[pSC][0][sS]=&find[pSC][0][bR]=false&find[pSC][1][sS]=&find[pSC][1][bR]=false&find[pSC][2][sS]=|cnUK&find[pSC][2][bR]=false&find[pSC][3][sS]=&find[pSC][3][bR]=false&find[pSC][4][sS]=&find[pSC][4][bR]=false&find[pSC][5][sS]=&find[pSC][5][bR]=false&find[pSC][6][sS]=&find[pSC][6][bR]=false&find[pSC][7][sS]=&find[pSC][7][bR]=false&find[pSC][8][sS]=&find[pSC][8][bR]=false&find[pSC][9][sS]=&find[pSC][9][bR]=false&find[pSC][10][sS]=&find[pSC][10][bR]=false&find[pSC][11][sS]=&find[pSC][11][bR]=false&find[pSC][12][sS]=&find[pSC][12][bR]=false&find[pSC][13][sS]=&find[pSC][13][bR]=false&find[pSC][14][sS]=&find[pSC][14][bR]=false&find[pSC][15][sS]=&find[pSC][15][bR]=false&find[sS]=and&find[bR]=true&highlight=false

Thanks

Alastair



Example, email:
Dear [log in to unmask]<mailto:[log in to unmask]>,

UKI-SCOTGRID-DURHAM has been automatically excluded from PanDA distributed production because it has failed the recent HC test jobs. You can see the exclusion policy at [1].

EXCLUSION REASON:
   BlackListing policy Last-Two-Plus-One True. See jobs [u'1464566650', u'1464561491', u'1464567438']

All recent test jobs can be viewed here:

http://panda.cern.ch/server/pandamon/query?job=*&site=UKI-SCOTGRID-DURHAM&type=ptest&hours=4&processingType=gangarobot-pft

The queue status of UKI-SCOTGRID-DURHAM is currently test.

Please coordinate the necessary fixes. HC will reset the queue online when the jobs are succeeding.
If you wish to disable this auto-exclusion service from changing UKI-SCOTGRID-DURHAM, then please follow the instructions at [1].

Cheers, ATLAS Distributed Analysis Operations

[1] https://twiki.cern.ch/twiki/bin/view/IT/HammerCloud#APPENDIX_2_ATLAS_Automatic_Site


Report generated on voatlas49 by /data/hc/apps/atlas/python/scripts/server/production_blacklist.py