Hi Alistair, Yep, the test works from Glasgow's worker nodes. I'd also agree with Graeme's comments; we should be switched from FZK to RAL for our backup. Cheers Mike On 17 June 2010 11:49, Alastair Dewhurst <[log in to unmask]> wrote: > Hi All > > After a discussion in today Thursday phone meeting we have decided the > following: > > 1) If you have been passing the SAM tests and are happy with your current > setup then no changes will be made to effect your site. > 2) If you have been failing (getting a warning) on the SAM tests I will > switch you over to having the RAL Tier 1 as your primary backup. > 3) If you would prefer to have RAL as your primary backup which will allow > things to be more easily monitored from the Tier 1 then I will switch you > over too. > > I would appreciate it if all sites, even if they don't want anything changed > did run the test as it does prove that direct access works (incase of > emergency). > > Site : Test (Who ran it) : SAM : Preference > RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1 > Liverpool : Passed (Stephen Jones) : ok : unknown > QMUL : Passed (Chris Walker) : ok : unknown > Cambridge : Passed (Santanu Das) : warn : Will be changed > Sheffield : Passed (Elena Korolkova) : ok : unknown > RHUL : Passed (Simon George) : ok : unknown > UCL : Passed (Ben Waugh) : ok : unknown > Manchester: Not run : ok : Stay the > same > Lancaster : Not run : ok : Stay > the same > Oxford : Not run : warn : Will > be changed > Birmingham: Not run : warn : Will be > changed > Glasgow : Not run : ok : > unknown, although still uses FZK which Graeme Stewart said should be > changed. > > I am still trying to sort out some new monitoring for the Tier 1 and I will > send out a confirmation before submitting any request to change > Tiersofatlas. If anyone has any additional suggestions regarding monitoring > and chasing up failures that is very welcome. As was said at the meeting, > this is a setup that seems to work very well most of the time, it is really > a question of how best to chase up the few problems when they occur without > creating lots of work for ourselves. > > Thanks > > Alastair > > > On 17 Jun 2010, at 10:22, Ben Waugh wrote: > >> This works for UCL (both HEP and Legion clusters). >> >> Cheers, >> Ben >> >> On 16/06/10 12:14, Alastair Dewhurst wrote: >>> >>> Hi Santanu >>> Thank you for spotting that, it should indeed be a capital F. I thought >>> I had copied and pasted the commands directly but maybe my mail client >>> decided to do some formatting. That should fix most of the problems as the >>> Frontier server/squid should be accessible to all. >>> If we were to make this change, it would not make RAL a single point of >>> failure. In order for their to be a failure both your own squid and RAL >>> would have to fail If RAL fails your own squid should be set up to access >>> PIC. The current situation means that if you and your back squid fail, >>> things will break. (If both RAL and PIC are down then you will also fail >>> under both systems but multiple T1 failures should hopefully be rare!) >>> Alastair >>> So the new instructions are: >>> Log into a WN >>> > wget http://frontier.cern.ch/dist/fnget.py >>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128 >>> > python fnget.py >>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier >>> --sql="SELECT TABLE_NAME FROM ALL_TABLES" >>> This should provide a big list of table names and not a python error! >>> On 16 Jun 2010, at 11:51, Santanu Das wrote: >>>> >>>> Hi Alastair and all, >>>> >>>> I think there is typo in the URL, it should be >>>> "http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not* >>>> "frontier" with small f. Now it works for with or without a http_proxy >>>> setting. >>>> >>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python fnget.py >>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier >>>> --sql="SELECT count(*) FROM ALL_TABLES" Using Frontier URL: >>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier Query: SELECT >>>> count(*) FROM ALL_TABLES Decode results: True Refresh cache: False Frontier >>>> Request: >>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs >>>> Query started: 06/16/10 11:44:15 BST Query ended: 06/16/10 11:44:16 BST >>>> Query time: 1.34605288506 [seconds] Query result: <?xml version="1.0" >>>> encoding="US-ASCII"?> <!DOCTYPE frontier SYSTEM >>>> "http://frontier.fnal.gov/frontier.dtd"> <frontier version="3.22" >>>> xmlversion="1.0"> <transaction payloads="1"> <payload >>>> type="frontier_request" version="1" encoding="BLOBzip"> >>>> <data>eJxjY2Bg4HD2D/UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality >>>> error="0" md5="3c31cc5665b2636e8feb209fafa558f6" records="1" >>>> full_size="35"/> </payload> </transaction> </frontier> Fields: COUNT(*) >>>> NUMBER Records: 8833 Cheers, >>>> Santanu >>>> >>>> >>>>> Hi >>>>> >>>>> The current state of the ATLAS frontier service is not ideal. The SAM >>>>> tests: >>>>> >>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint >>>>> show several production sites getting a warning. This warning is >>>>> normally caused by the backup squid not being configured correctly. >>>>> >>>>> To remind people: WNs should connect to the local squid (normally at >>>>> the site) which connects to the Frontier server at RAL. If the local squid >>>>> is down then the WN will try and connect to a backup squid which is meant to >>>>> be at a nearby site which will then try and connect to the Frontier server. >>>>> There is a similar backup process should the Frontier server at RAL fail >>>>> then all the squids will try and connect to the frontier server at PIC. >>>>> >>>>> To ease this problem it has been suggested that the default backup for >>>>> Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!). The squid at >>>>> the Tier 1 is the same installation as the Frontier server so if the >>>>> frontier services goes down so will the backup squid. This does reduce the >>>>> resilience of the setup slightly but I think this is worth it given it >>>>> should make things significantly simpler to maintain. It does also means I >>>>> will have to get the SAM test modified slightly. If however there are sites >>>>> that are happy with the current setup and managing firewall access to their >>>>> squid from other sites worker nodes then please feel free to respond. >>>>> >>>>> Before committing any change to Tiersofatlas I would like sites to run >>>>> a test to make sure they can indeed successfully access the RAL squid. >>>>> >>>>> To do this: >>>>> Log into a WN >>>>> > wget http://frontier.cern.ch/dist/fnget.py >>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128 >>>>> > python fnget.py >>>>> > --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier >>>>> > --sql="SELECT TABLE_NAME FROM ALL_TABLES" >>>>> This should provide a big list of table names and not a python error! >>>>> >>>>> Could sites please reply with the results of the test and any comments >>>>> are also welcome. >>>>> >>>>> Thanks >>>>> >>>>> Alastair >>>> >> >> -- >> Dr Ben Waugh Tel. +44 (0)20 7679 7223 >> Dept of Physics and Astronomy Internal: 37223 >> University College London >> London WC1E 6BT >