Hi, test works on Birmingham WNs (both clusters).
Lawrie Lowe
Tel: 0121 414 4621 Fax: 0121 414 6709 Email: [log in to unmask]
On Thu, 17 Jun 2010, Alessandra Forti wrote:
> Hi Alastair,
>
> the test works on Manchester WNs.
>
> cheers
> alessandra
>
> Alastair Dewhurst wrote:
>> Hi All
>>
>> After a discussion in today Thursday phone meeting we have decided the
>> following:
>>
>> 1) If you have been passing the SAM tests and are happy with your current
>> setup then no changes will be made to effect your site.
>> 2) If you have been failing (getting a warning) on the SAM tests I will
>> switch you over to having the RAL Tier 1 as your primary backup.
>> 3) If you would prefer to have RAL as your primary backup which will allow
>> things to be more easily monitored from the Tier 1 then I will switch you
>> over too.
>>
>> I would appreciate it if all sites, even if they don't want anything
>> changed did run the test as it does prove that direct access works (incase
>> of emergency).
>>
>> Site : Test (Who ran it) : SAM : Preference
>> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
>> Liverpool : Passed (Stephen Jones) : ok : unknown
>> QMUL : Passed (Chris Walker) : ok : unknown
>> Cambridge : Passed (Santanu Das) : warn : Will be changed
>> Sheffield : Passed (Elena Korolkova) : ok : unknown
>> RHUL : Passed (Simon George) : ok : unknown
>> UCL : Passed (Ben Waugh) : ok : unknown
>> Manchester: Not run : ok : Stay the same
>> Lancaster : Not run : ok : Stay the same
>> Oxford : Not run : warn : Will be changed
>> Birmingham: Not run : warn : Will be changed
>> Glasgow : Not run : ok : unknown, although still uses FZK which Graeme
>> Stewart said should be changed.
>>
>> I am still trying to sort out some new monitoring for the Tier 1 and I will
>> send out a confirmation before submitting any request to change
>> Tiersofatlas. If anyone has any additional suggestions regarding monitoring
>> and chasing up failures that is very welcome. As was said at the meeting,
>> this is a setup that seems to work very well most of the time, it is really
>> a question of how best to chase up the few problems when they occur without
>> creating lots of work for ourselves.
>>
>> Thanks
>>
>> Alastair
>>
>>
>> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>>
>>> This works for UCL (both HEP and Legion clusters).
>>>
>>> Cheers,
>>> Ben
>>>
>>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>>> Hi Santanu
>>>> Thank you for spotting that, it should indeed be a capital F. I thought I
>>>> had copied and pasted the commands directly but maybe my mail client
>>>> decided to do some formatting. That should fix most of the problems as
>>>> the Frontier server/squid should be accessible to all.
>>>> If we were to make this change, it would not make RAL a single point of
>>>> failure. In order for their to be a failure both your own squid and RAL
>>>> would have to fail If RAL fails your own squid should be set up to access
>>>> PIC. The current situation means that if you and your back squid fail,
>>>> things will break. (If both RAL and PIC are down then you will also fail
>>>> under both systems but multiple T1 failures should hopefully be rare!)
>>>> Alastair
>>>> So the new instructions are:
>>>> Log into a WN
>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>> > python fnget.py
>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>> This should provide a big list of table names and not a python error!
>>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>>> Hi Alastair and all,
>>>>>
>>>>> I think there is typo in the URL, it should be
>>>>> "http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not*
>>>>> "frontier" with small f. Now it works for with or without a http_proxy
>>>>> setting.
>>>>>
>>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python fnget.py
>>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>>> --sql="SELECT count(*) FROM ALL_TABLES" Using Frontier URL:
>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier Query:
>>>>> SELECT count(*) FROM ALL_TABLES Decode results: True Refresh cache:
>>>>> False Frontier Request:
>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs
>>>>> Query started: 06/16/10 11:44:15 BST Query ended: 06/16/10 11:44:16 BST
>>>>> Query time: 1.34605288506 [seconds] Query result: <?xml version="1.0"
>>>>> encoding="US-ASCII"?> <!DOCTYPE frontier SYSTEM
>>>>> "http://frontier.fnal.gov/frontier.dtd"> <frontier version="3.22"
>>>>> xmlversion="1.0"> <transaction payloads="1"> <payload
>>>>> type="frontier_request" version="1" encoding="BLOBzip">
>>>>> <data>eJxjY2Bg4HD2D/UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data>
>>>>> <quality error="0" md5="3c31cc5665b2636e8feb209fafa558f6" records="1"
>>>>> full_size="35"/> </payload> </transaction> </frontier> Fields: COUNT(*)
>>>>> NUMBER Records: 8833 Cheers,
>>>>> Santanu
>>>>>
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> The current state of the ATLAS frontier service is not ideal. The SAM
>>>>>> tests:
>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint
>>>>>> show several production sites getting a warning. This warning is
>>>>>> normally caused by the backup squid not being configured correctly.
>>>>>>
>>>>>> To remind people: WNs should connect to the local squid (normally at
>>>>>> the site) which connects to the Frontier server at RAL. If the local
>>>>>> squid is down then the WN will try and connect to a backup squid which
>>>>>> is meant to be at a nearby site which will then try and connect to the
>>>>>> Frontier server. There is a similar backup process should the Frontier
>>>>>> server at RAL fail then all the squids will try and connect to the
>>>>>> frontier server at PIC.
>>>>>>
>>>>>> To ease this problem it has been suggested that the default backup for
>>>>>> Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!). The
>>>>>> squid at the Tier 1 is the same installation as the Frontier server so
>>>>>> if the frontier services goes down so will the backup squid. This does
>>>>>> reduce the resilience of the setup slightly but I think this is worth
>>>>>> it given it should make things significantly simpler to maintain. It
>>>>>> does also means I will have to get the SAM test modified slightly. If
>>>>>> however there are sites that are happy with the current setup and
>>>>>> managing firewall access to their squid from other sites worker nodes
>>>>>> then please feel free to respond.
>>>>>>
>>>>>> Before committing any change to Tiersofatlas I would like sites to run
>>>>>> a test to make sure they can indeed successfully access the RAL squid.
>>>>>>
>>>>>> To do this:
>>>>>> Log into a WN
>>>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>> > python fnget.py
>>>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>>>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>>> This should provide a big list of table names and not a python error!
>>>>>>
>>>>>> Could sites please reply with the results of the test and any comments
>>>>>> are also welcome.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Alastair
>>>>>
>>>
>>> --
>>> Dr Ben Waugh Tel. +44 (0)20 7679 7223
>>> Dept of Physics and Astronomy Internal: 37223
>>> University College London
>>> London WC1E 6BT
>
> --
> The most effective way to do it, is to do it. (Amelia Earhart)
> Northgrid Tier2 Technical Coordinator
> http://www.hep.manchester.ac.uk/computing/tier2
>
|