Lancs fine. On 17 June 2010 13:22, Lawrence Lowe <[log in to unmask]> wrote: > Hi, test works on Birmingham WNs (both clusters). > > Lawrie Lowe > > Tel: 0121 414 4621 Fax: 0121 414 6709 Email: [log in to unmask] > > On Thu, 17 Jun 2010, Alessandra Forti wrote: > >> Hi Alastair, >> >> the test works on Manchester WNs. >> >> cheers >> alessandra >> >> Alastair Dewhurst wrote: >>> >>> Hi All >>> >>> After a discussion in today Thursday phone meeting we have decided the >>> following: >>> >>> 1) If you have been passing the SAM tests and are happy with your current >>> setup then no changes will be made to effect your site. >>> 2) If you have been failing (getting a warning) on the SAM tests I will >>> switch you over to having the RAL Tier 1 as your primary backup. >>> 3) If you would prefer to have RAL as your primary backup which will >>> allow things to be more easily monitored from the Tier 1 then I will switch >>> you over too. >>> >>> I would appreciate it if all sites, even if they don't want anything >>> changed did run the test as it does prove that direct access works (incase >>> of emergency). >>> >>> Site : Test (Who ran it) : SAM : Preference >>> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1 >>> Liverpool : Passed (Stephen Jones) : ok : unknown >>> QMUL : Passed (Chris Walker) : ok : unknown >>> Cambridge : Passed (Santanu Das) : warn : Will be changed >>> Sheffield : Passed (Elena Korolkova) : ok : unknown >>> RHUL : Passed (Simon George) : ok : unknown >>> UCL : Passed (Ben Waugh) : ok : unknown >>> Manchester: Not run : ok : Stay the same >>> Lancaster : Not run : ok : Stay the same >>> Oxford : Not run : warn : Will be changed >>> Birmingham: Not run : warn : Will be changed >>> Glasgow : Not run : ok : unknown, although still uses FZK which Graeme >>> Stewart said should be changed. >>> >>> I am still trying to sort out some new monitoring for the Tier 1 and I >>> will send out a confirmation before submitting any request to change >>> Tiersofatlas. If anyone has any additional suggestions regarding monitoring >>> and chasing up failures that is very welcome. As was said at the meeting, >>> this is a setup that seems to work very well most of the time, it is really >>> a question of how best to chase up the few problems when they occur without >>> creating lots of work for ourselves. >>> >>> Thanks >>> >>> Alastair >>> >>> >>> On 17 Jun 2010, at 10:22, Ben Waugh wrote: >>> >>>> This works for UCL (both HEP and Legion clusters). >>>> >>>> Cheers, >>>> Ben >>>> >>>> On 16/06/10 12:14, Alastair Dewhurst wrote: >>>>> >>>>> Hi Santanu >>>>> Thank you for spotting that, it should indeed be a capital F. I thought >>>>> I had copied and pasted the commands directly but maybe my mail client >>>>> decided to do some formatting. That should fix most of the problems as the >>>>> Frontier server/squid should be accessible to all. >>>>> If we were to make this change, it would not make RAL a single point of >>>>> failure. In order for their to be a failure both your own squid and RAL >>>>> would have to fail If RAL fails your own squid should be set up to access >>>>> PIC. The current situation means that if you and your back squid fail, >>>>> things will break. (If both RAL and PIC are down then you will also fail >>>>> under both systems but multiple T1 failures should hopefully be rare!) >>>>> Alastair >>>>> So the new instructions are: >>>>> Log into a WN >>>>> > wget http://frontier.cern.ch/dist/fnget.py >>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128 >>>>> > python fnget.py >>>>> > --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier >>>>> > --sql="SELECT TABLE_NAME FROM ALL_TABLES" >>>>> This should provide a big list of table names and not a python error! >>>>> On 16 Jun 2010, at 11:51, Santanu Das wrote: >>>>>> >>>>>> Hi Alastair and all, >>>>>> >>>>>> I think there is typo in the URL, it should be >>>>>> "http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not* >>>>>> "frontier" with small f. Now it works for with or without a http_proxy >>>>>> setting. >>>>>> >>>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python >>>>>> fnget.py >>>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier >>>>>> --sql="SELECT count(*) FROM ALL_TABLES" Using Frontier URL: >>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier Query: SELECT >>>>>> count(*) FROM ALL_TABLES Decode results: True Refresh cache: False Frontier >>>>>> Request: >>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs >>>>>> Query started: 06/16/10 11:44:15 BST Query ended: 06/16/10 11:44:16 BST >>>>>> Query time: 1.34605288506 [seconds] Query result: <?xml version="1.0" >>>>>> encoding="US-ASCII"?> <!DOCTYPE frontier SYSTEM >>>>>> "http://frontier.fnal.gov/frontier.dtd"> <frontier version="3.22" >>>>>> xmlversion="1.0"> <transaction payloads="1"> <payload >>>>>> type="frontier_request" version="1" encoding="BLOBzip"> >>>>>> <data>eJxjY2Bg4HD2D/UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality >>>>>> error="0" md5="3c31cc5665b2636e8feb209fafa558f6" records="1" >>>>>> full_size="35"/> </payload> </transaction> </frontier> Fields: COUNT(*) >>>>>> NUMBER Records: 8833 Cheers, >>>>>> Santanu >>>>>> >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> The current state of the ATLAS frontier service is not ideal. The SAM >>>>>>> tests: >>>>>>> >>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint >>>>>>> show several production sites getting a warning. This warning is normally >>>>>>> caused by the backup squid not being configured correctly. >>>>>>> >>>>>>> To remind people: WNs should connect to the local squid (normally at >>>>>>> the site) which connects to the Frontier server at RAL. If the local squid >>>>>>> is down then the WN will try and connect to a backup squid which is meant to >>>>>>> be at a nearby site which will then try and connect to the Frontier server. >>>>>>> There is a similar backup process should the Frontier server at RAL fail >>>>>>> then all the squids will try and connect to the frontier server at PIC. >>>>>>> >>>>>>> To ease this problem it has been suggested that the default backup >>>>>>> for Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!). The squid >>>>>>> at the Tier 1 is the same installation as the Frontier server so if the >>>>>>> frontier services goes down so will the backup squid. This does reduce the >>>>>>> resilience of the setup slightly but I think this is worth it given it >>>>>>> should make things significantly simpler to maintain. It does also means I >>>>>>> will have to get the SAM test modified slightly. If however there are sites >>>>>>> that are happy with the current setup and managing firewall access to their >>>>>>> squid from other sites worker nodes then please feel free to respond. >>>>>>> >>>>>>> Before committing any change to Tiersofatlas I would like sites to >>>>>>> run a test to make sure they can indeed successfully access the RAL squid. >>>>>>> >>>>>>> To do this: >>>>>>> Log into a WN >>>>>>> > wget http://frontier.cern.ch/dist/fnget.py >>>>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128 >>>>>>> > python fnget.py >>>>>>> > --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier >>>>>>> > --sql="SELECT TABLE_NAME FROM ALL_TABLES" >>>>>>> This should provide a big list of table names and not a python error! >>>>>>> >>>>>>> Could sites please reply with the results of the test and any >>>>>>> comments are also welcome. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Alastair >>>>>> >>>> >>>> -- >>>> Dr Ben Waugh Tel. +44 (0)20 7679 7223 >>>> Dept of Physics and Astronomy Internal: 37223 >>>> University College London >>>> London WC1E 6BT >> >> -- >> The most effective way to do it, is to do it. (Amelia Earhart) Northgrid >> Tier2 Technical Coordinator >> http://www.hep.manchester.ac.uk/computing/tier2 >> >