JISCMail - TB-SUPPORT Archives

Lancs fine.

On 17 June 2010 13:22, Lawrence Lowe <[log in to unmask]> wrote:
> Hi, test works on Birmingham WNs (both clusters).
>
> Lawrie Lowe
>
> Tel: 0121 414 4621    Fax: 0121 414 6709    Email: [log in to unmask]
>
> On Thu, 17 Jun 2010, Alessandra Forti wrote:
>
>> Hi Alastair,
>>
>> the test works on Manchester WNs.
>>
>> cheers
>> alessandra
>>
>> Alastair Dewhurst wrote:
>>>
>>> Hi All
>>>
>>> After a discussion in today Thursday phone meeting we have decided the
>>> following:
>>>
>>> 1) If you have been passing the SAM tests and are happy with your current
>>> setup then no changes will be made to effect your site.
>>> 2) If you have been failing (getting a warning) on the SAM tests I will
>>> switch you over to having the RAL Tier 1 as your primary backup.
>>> 3) If you would prefer to have RAL as your primary backup which will
>>> allow things to be more easily monitored from the Tier 1 then I will switch
>>> you over too.
>>>
>>> I would appreciate it if all sites, even if they don't want anything
>>> changed did run the test as it does prove that direct access works (incase
>>> of emergency).
>>>
>>> Site : Test (Who ran it) : SAM : Preference
>>> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
>>> Liverpool : Passed (Stephen Jones) : ok : unknown
>>> QMUL : Passed (Chris Walker) : ok : unknown
>>> Cambridge : Passed (Santanu Das) : warn : Will be changed
>>> Sheffield : Passed (Elena Korolkova) : ok : unknown
>>> RHUL : Passed (Simon George) : ok : unknown
>>> UCL : Passed (Ben Waugh) : ok : unknown
>>> Manchester: Not run : ok : Stay the same
>>> Lancaster : Not run : ok : Stay the same
>>> Oxford : Not run : warn : Will be changed
>>> Birmingham: Not run : warn : Will be changed
>>> Glasgow : Not run : ok : unknown, although still uses FZK which Graeme
>>> Stewart said should be changed.
>>>
>>> I am still trying to sort out some new monitoring for the Tier 1 and I
>>> will send out a confirmation before submitting any request to change
>>> Tiersofatlas. If anyone has any additional suggestions regarding monitoring
>>> and chasing up failures that is very welcome. As was said at the meeting,
>>> this is a setup that seems to work very well most of the time, it is really
>>> a question of how best to chase up the few problems when they occur without
>>> creating lots of work for ourselves.
>>>
>>> Thanks
>>>
>>> Alastair
>>>
>>>
>>> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>>>
>>>> This works for UCL (both HEP and Legion clusters).
>>>>
>>>> Cheers,
>>>> Ben
>>>>
>>>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>>>>
>>>>> Hi Santanu
>>>>> Thank you for spotting that, it should indeed be a capital F. I thought
>>>>> I had copied and pasted the commands directly but maybe my mail client
>>>>> decided to do some formatting. That should fix most of the problems as the
>>>>> Frontier server/squid should be accessible to all.
>>>>> If we were to make this change, it would not make RAL a single point of
>>>>> failure. In order for their to be a failure both your own squid and RAL
>>>>> would have to fail If RAL fails your own squid should be set up to access
>>>>> PIC. The current situation means that if you and your back squid fail,
>>>>> things will break. (If both RAL and PIC are down then you will also fail
>>>>> under both systems but multiple T1 failures should hopefully be rare!)
>>>>> Alastair
>>>>> So the new instructions are:
>>>>> Log into a WN
>>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>> > python fnget.py
>>>>> > --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>>> > --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>> This should provide a big list of table names and not a python error!
>>>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>>>>
>>>>>> Hi Alastair and all,
>>>>>>
>>>>>> I think there is typo in the URL, it should be
>>>>>> "http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not*
>>>>>> "frontier" with small f. Now it works for with or without a http_proxy
>>>>>> setting.
>>>>>>
>>>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python
>>>>>> fnget.py
>>>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>>>> --sql="SELECT count(*) FROM ALL_TABLES" Using Frontier URL:
>>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier Query: SELECT
>>>>>> count(*) FROM ALL_TABLES Decode results: True Refresh cache: False Frontier
>>>>>> Request:
>>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs
>>>>>> Query started: 06/16/10 11:44:15 BST Query ended: 06/16/10 11:44:16 BST
>>>>>> Query time: 1.34605288506 [seconds] Query result: <?xml version="1.0"
>>>>>> encoding="US-ASCII"?> <!DOCTYPE frontier SYSTEM
>>>>>> "http://frontier.fnal.gov/frontier.dtd"> <frontier version="3.22"
>>>>>> xmlversion="1.0"> <transaction payloads="1"> <payload
>>>>>> type="frontier_request" version="1" encoding="BLOBzip">
>>>>>> <data>eJxjY2Bg4HD2D/UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality
>>>>>> error="0" md5="3c31cc5665b2636e8feb209fafa558f6" records="1"
>>>>>> full_size="35"/> </payload> </transaction> </frontier> Fields: COUNT(*)
>>>>>> NUMBER Records: 8833 Cheers,
>>>>>> Santanu
>>>>>>
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> The current state of the ATLAS frontier service is not ideal. The SAM
>>>>>>> tests:
>>>>>>>
>>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint
>>>>>>> show several production sites getting a warning. This warning is normally
>>>>>>> caused by the backup squid not being configured correctly.
>>>>>>>
>>>>>>> To remind people: WNs should connect to the local squid (normally at
>>>>>>> the site) which connects to the Frontier server at RAL. If the local squid
>>>>>>> is down then the WN will try and connect to a backup squid which is meant to
>>>>>>> be at a nearby site which will then try and connect to the Frontier server.
>>>>>>> There is a similar backup process should the Frontier server at RAL fail
>>>>>>> then all the squids will try and connect to the frontier server at PIC.
>>>>>>>
>>>>>>> To ease this problem it has been suggested that the default backup
>>>>>>> for Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!). The squid
>>>>>>> at the Tier 1 is the same installation as the Frontier server so if the
>>>>>>> frontier services goes down so will the backup squid. This does reduce the
>>>>>>> resilience of the setup slightly but I think this is worth it given it
>>>>>>> should make things significantly simpler to maintain. It does also means I
>>>>>>> will have to get the SAM test modified slightly. If however there are sites
>>>>>>> that are happy with the current setup and managing firewall access to their
>>>>>>> squid from other sites worker nodes then please feel free to respond.
>>>>>>>
>>>>>>> Before committing any change to Tiersofatlas I would like sites to
>>>>>>> run a test to make sure they can indeed successfully access the RAL squid.
>>>>>>>
>>>>>>> To do this:
>>>>>>> Log into a WN
>>>>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>>> > python fnget.py
>>>>>>> > --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>>>>>> > --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>>>> This should provide a big list of table names and not a python error!
>>>>>>>
>>>>>>> Could sites please reply with the results of the test and any
>>>>>>> comments are also welcome.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Alastair
>>>>>>
>>>>
>>>> --
>>>> Dr Ben Waugh Tel. +44 (0)20 7679 7223
>>>> Dept of Physics and Astronomy Internal: 37223
>>>> University College London
>>>> London WC1E 6BT
>>
>> --
>> The most effective way to do it, is to do it. (Amelia Earhart) Northgrid
>> Tier2 Technical Coordinator
>> http://www.hep.manchester.ac.uk/computing/tier2
>>
>