Thanks Alastair. That works for me now.
I think we are happy staying with QMUL as backup at the moment. If it
ever becomes a problem then we'll reconsider of course (as I am sure
Chris will if the opposite is true!)
Alastair Dewhurst wrote:
> Hi Simon
>
> You are passing the SAM test. Both the Frontier server at PIC and RAL
> should be open to all. The reason your test failed was because you
> missed out /Frontier at the end of your query. It should be:
> python fnget.py
> --url=http://atlfrontier.pic.es:3128/pic-frontier/Frontier --sql="SELECT
> count(*) FROM ALL_TABLES"
>
> One thing I am trying to get sorted, is if you have RAL as your backup
> squid, then that squid is only configured to access RAL Frontier. If
> you try and use RAL squid to access PIC it will fail. However because
> the RAL squid and Frontier are part of the same installation then if one
> is down so will the other and the test adds nothing.
>
> Alastair
>
>
>
> On 17 Jun 2010, at 16:28, Simon George wrote:
>
>> Since I noticed in TiersOfATLASCache.py that the frontiers for RHUL
>> (and all others) are RAL and PIC, I thought I should try PIC. I find
>> that I cannot access it from my WNs.
>> RAL works fine as reported previously, but PIC gives 'Connection
>> refused'. See below for example. Is this really ok?
>>
>> Simon
>>
>> [root@node001 ~]# unset http_proxy[root@node001 ~]# python fnget.py
>> --url=http://atlfrontier.pic.es:3128/pic-frontier --sql="SELECT
>> count(*) FROM ALL_TABLES"
>> Using Frontier URL: http://atlfrontier.pic.es:3128/pic-frontier
>> Query: SELECT count(*) FROM ALL_TABLES
>> Decode results: True
>> Refresh cache: False
>>
>> Frontier Request:
>> http://atlfrontier.pic.es:3128/pic-frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs
>>
>>
>> Query started: 06/17/10 16:26:36 BST
>> Traceback (most recent call last):
>> File "fnget.py", line 231, in ?
>> result = urllib2.urlopen(request).read()
>> File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen
>> return _opener.open(url, data)
>> File "/usr/lib64/python2.4/urllib2.py", line 364, in open
>> response = meth(req, response)
>> File "/usr/lib64/python2.4/urllib2.py", line 471, in http_response
>> response = self.parent.error(
>> File "/usr/lib64/python2.4/urllib2.py", line 396, in error
>> result = self._call_chain(*args)
>> File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain
>> result = func(*args)
>> File "/usr/lib64/python2.4/urllib2.py", line 554, in http_error_302
>> return self.parent.open(new)
>> File "/usr/lib64/python2.4/urllib2.py", line 358, in open
>> response = self._open(req, data)
>> File "/usr/lib64/python2.4/urllib2.py", line 376, in _open
>> '_open', req)
>> File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain
>> result = func(*args)
>> File "fnget.py", line 221, in http_open
>> return self.do_open(TimeoutHTTPConnection, req)
>> File "/usr/lib64/python2.4/urllib2.py", line 1006, in do_open
>> raise URLError(err)
>> urllib2.URLError: <urlopen error (111, 'Connection refused')>
>>
>>
>>
>> Alastair Dewhurst wrote:
>>> Hi Pete
>>> Currently, as far as I am aware there is only TiersofATLAS where it
>>> is recorded:
>>> http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py
>>> The relevant part is here:
>>> # UK
>>> 'RAL-LCG2':{'mysquid':'','frontiers':[fral,fpic],'squids':[]},
>>> 'UKI-SCOTGRID-ECDF':{'mysquid':'','frontiers':[fral,fpic],'squids':['UKI-SCOTGRID-GLASGOW','UKI-NORTHGRID-LANCS-HEP']},
>>> 'UKI-SCOTGRID-GLASGOW':{'mysquid':'http://nat005.gla.scotgrid.ac.uk:3128','frontiers':[fkit,fpic],'squids':['local','UKI-NORTHGRID-LANCS-HEP']},
>>> 'UKI-NORTHGRID-LANCS-HEP':{'mysquid':'http://fal-pygrid-45.lancs.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SCOTGRID-GLASGOW']},
>>> 'UKI-NORTHGRID-LIV-HEP':{'mysquid':'http://atlcache.ph.liv.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-SHEF-HEP']},
>>> 'UKI-NORTHGRID-SHEF-HEP':{'mysquid':'http://lcgsquid.shef.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-LIV-HEP']},
>>> 'UKI-NORTHGRID-MAN-HEP':{'mysquid':'http://squid-cache.tier2.hep.manchester.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-LANCS-HEP']},
>>> 'UKI-SOUTHGRID-BHAM-HEP':{'mysquid':'http://epgr01.ph.bham.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>>> 'UKI-SOUTHGRID-CAM-HEP':{'mysquid':'http://ce.hep.phy.cam.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>>> 'UKI-SOUTHGRID-OX-HEP':{'mysquid':'http://t2squid01.physics.ox.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-RALPP']},
>>> 'UKI-SOUTHGRID-RALPP':{'mysquid':'http://atlassquid.pp.rl.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>>> 'UKI-LT2-QMUL':{'mysquid':'http://frontiercache.esc.qmul.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-RHUL']},
>>> 'UKI-LT2-RHUL'
>>> :{'mysquid':'http://squid1.ppgrid1.rhul.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
>>> #
>>> 'UKI-LT2-UCL-CENTRAL':{'mysquid':'','frontiers':[fral,fpic],'squids':['UKI-LT2-UCL-HEP','UKI-LT2-QMUL']},
>>> 'UKI-LT2-UCL-HEP':{'mysquid':'http://squid.hep.ucl.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
>>> So according to that you are backup for Cambridge, RALPP and
>>> Birmingham. If those sites would like to keep you as their backup
>>> they need to inform you of where their WN are so you can make an
>>> exception in your firewall.
>>> The twiki page does have Frontier backup but not T2 backups. I could
>>> add it but it doesn't make it that much easier to find.
>>> https://twiki.cern.ch/twiki/bin/view/Atlas/T2SquidDeployment
>>> I know its all a horrible mess at the moment.
>>> Alastair
>>> On 17 Jun 2010, at 14:46, Peter Gronbech wrote:
>>>> Works from Oxfords WNs.
>>>>
>>>> Can you tell me where the I can see a list of which sites Oxford is
>>>> supposed to be the backup for?
>>>>
>>>> Thanks Pete
>>>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes
>>>> [mailto:[log in to unmask]] On Behalf Of Alastair Dewhurst
>>>> Sent: 17 June 2010 11:50
>>>> To: [log in to unmask]
>>>> Subject: Re: ATLAS Tier 2 squid failover
>>>>
>>>> Hi All
>>>>
>>>> After a discussion in today Thursday phone meeting we have decided
>>>> the following:
>>>>
>>>> 1) If you have been passing the SAM tests and are happy with your
>>>> current setup then no changes will be made to effect your site.
>>>> 2) If you have been failing (getting a warning) on the SAM tests I
>>>> will switch you over to having the RAL Tier 1 as your primary backup.
>>>> 3) If you would prefer to have RAL as your primary backup which will
>>>> allow things to be more easily monitored from the Tier 1 then I will
>>>> switch you over too.
>>>>
>>>> I would appreciate it if all sites, even if they don't want anything
>>>> changed did run the test as it does prove that direct access works
>>>> (incase of emergency).
>>>>
>>>> Site : Test (Who ran it) : SAM :
>>>> Preference
>>>> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
>>>> Liverpool : Passed (Stephen Jones) : ok : unknown
>>>> QMUL : Passed (Chris Walker) : ok : unknown
>>>> Cambridge : Passed (Santanu Das) : warn : Will be changed
>>>> Sheffield : Passed (Elena Korolkova) : ok : unknown
>>>> RHUL : Passed (Simon George) : ok : unknown
>>>> UCL : Passed (Ben Waugh) : ok : unknown
>>>> Manchester: Not run : ok :
>>>> Stay the same
>>>> Lancaster : Not run : ok :
>>>> Stay the same
>>>> Oxford : Not run : warn :
>>>> Will be changed
>>>> Birmingham: Not run : warn : Will
>>>> be changed
>>>> Glasgow : Not run : ok :
>>>> unknown, although still uses FZK which Graeme Stewart said should be
>>>> changed.
>>>>
>>>> I am still trying to sort out some new monitoring for the Tier 1 and
>>>> I will send out a confirmation before submitting any request to
>>>> change Tiersofatlas. If anyone has any additional suggestions
>>>> regarding monitoring and chasing up failures that is very welcome.
>>>> As was said at the meeting, this is a setup that seems to work very
>>>> well most of the time, it is really a question of how best to chase
>>>> up the few problems when they occur without creating lots of work for
>>>> ourselves.
>>>>
>>>> Thanks
>>>>
>>>> Alastair
>>>>
>>>>
>>>> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>>>>
>>>>> This works for UCL (both HEP and Legion clusters).
>>>>>
>>>>> Cheers,
>>>>> Ben
>>>>>
>>>>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>>>>> Hi Santanu
>>>>>> Thank you for spotting that, it should indeed be a capital F. I
>>>>>> thought I had copied and pasted the commands directly but maybe my
>>>>>> mail client decided to do some formatting. That should fix most
>>>>>> of the problems as the Frontier server/squid should be accessible
>>>>>> to all.
>>>>>> If we were to make this change, it would not make RAL a single
>>>>>> point of failure. In order for their to be a failure both your
>>>>>> own squid and RAL would have to fail If RAL fails your own squid
>>>>>> should be set up to access PIC. The current situation means that
>>>>>> if you and your back squid fail, things will break. (If both RAL
>>>>>> and PIC are down then you will also fail under both systems but
>>>>>> multiple T1 failures should hopefully be rare!)
>>>>>> Alastair
>>>>>> So the new instructions are:
>>>>>> Log into a WN
>>>>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>>> frontierATLAS/Frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>>> This should provide a big list of table names and not a python error!
>>>>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>>>>> Hi Alastair and all,
>>>>>>>
>>>>>>> I think there is typo in the URL, it should be "http://lcgft-
>>>>>>> atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not*
>>>>>>> "frontier" with small f. Now it works for with or without a
>>>>>>> http_proxy setting.
>>>>>>>
>>>>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python
>>>>>>> fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>>>> frontierATLAS/Frontier --sql="SELECT count(*) FROM ALL_TABLES"
>>>>>>> Using Frontier URL: http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>>>> frontierATLAS/Frontier Query: SELECT count(*) FROM ALL_TABLES
>>>>>>> Decode results: True Refresh cache: False Frontier Request:
>>>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?
>>>>>>> type=frontier_request:
>>>>>>> 1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3
>>>>
>>>>>>> F08nENBgCQ9wjs Query started: 06/16/10 11:44:15 BST Query ended:
>>>>>>> 06/16/10 11:44:16 BST Query time: 1.34605288506 [seconds] Query
>>>>>>> result: <?xml version="1.0" encoding="US-ASCII"?> <!DOCTYPE
>>>>>>> frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
>>>>>>> <frontier version="3.22" xmlversion="1.0"> <transaction
>>>>>>> payloads="1"> <payload type="frontier_request" version="1"
>>>>>>> encoding="BLOBzip"> <data>eJxjY2Bg4HD2D/
>>>>>>> UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality error="0"
>>>>>>> md5="3c31cc5665b2636e8feb209fafa558f6" records="1" full_size="35"/
>>>>>>>> </payload> </transaction> </frontier> Fields: COUNT(*) NUMBER
>>>>>>> Records: 8833 Cheers,
>>>>>>> Santanu
>>>>>>>
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> The current state of the ATLAS frontier service is not ideal.
>>>>>>>> The SAM tests:
>>>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-
>>>>>>>> ATLAS-sft-Frontier-
>>>>>>>> Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_stat
>>>>
>>>>>>>> us=ok&disp_status=info&disp_status=note&disp_status=warn&disp_statu
>>>>
>>>>>>>> s=error&disp_status=crit&disp_status=maint
>>>>>>>> show several production sites getting a warning. This warning
>>>>>>>> is normally caused by the backup squid not being configured
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> To remind people: WNs should connect to the local squid
>>>>>>>> (normally at the site) which connects to the Frontier server at
>>>>>>>> RAL. If the local squid is down then the WN will try and
>>>>>>>> connect to a backup squid which is meant to be at a nearby site
>>>>>>>> which will then try and connect to the Frontier server. There
>>>>>>>> is a similar backup process should the Frontier server at RAL
>>>>>>>> fail then all the squids will try and connect to the frontier
>>>>>>>> server at PIC.
>>>>>>>>
>>>>>>>> To ease this problem it has been suggested that the default
>>>>>>>> backup for Tier 2 sites is the squid at RAL (The Tier 1 not the
>>>>>>>> Tier 2!). The squid at the Tier 1 is the same installation as
>>>>>>>> the Frontier server so if the frontier services goes down so
>>>>>>>> will the backup squid. This does reduce the resilience of the
>>>>>>>> setup slightly but I think this is worth it given it should make
>>>>>>>> things significantly simpler to maintain. It does also means I
>>>>>>>> will have to get the SAM test modified slightly. If however
>>>>>>>> there are sites that are happy with the current setup and
>>>>>>>> managing firewall access to their squid from other sites worker
>>>>>>>> nodes then please feel free to respond.
>>>>>>>>
>>>>>>>> Before committing any change to Tiersofatlas I would like sites
>>>>>>>> to run a test to make sure they can indeed successfully access
>>>>>>>> the RAL squid.
>>>>>>>>
>>>>>>>> To do this:
>>>>>>>> Log into a WN
>>>>>>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>>>>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>>>>> frontierATLAS/frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>>>>> This should provide a big list of table names and not a python
>>>>>>>> error!
>>>>>>>>
>>>>>>>> Could sites please reply with the results of the test and any
>>>>>>>> comments are also welcome.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Alastair
>>>>>>>
>>>>>
>>>>> --
>>>>> Dr Ben Waugh Tel. +44 (0)20 7679
>>>>> 7223
>>>>> Dept of Physics and Astronomy Internal: 37223
>>>>> University College London
>>>>> London WC1E 6BT
|