So UCL is not anyone's backup, while we and RHUL both use QMUL as a
backup. I'm happy to leave things as they are, but also don't mind
switching to RAL if Chris would prefer not to continue as a backup site.
Cheers,
Ben
On 17/06/10 15:50, Alastair Dewhurst wrote:
> Hi Pete
>
> Currently, as far as I am aware there is only TiersofATLAS where it is
> recorded:
> http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py
>
>
> The relevant part is here:
> # UK
> 'RAL-LCG2':{'mysquid':'','frontiers':[fral,fpic],'squids':[]},
> 'UKI-SCOTGRID-ECDF':{'mysquid':'','frontiers':[fral,fpic],'squids':['UKI-SCOTGRID-GLASGOW','UKI-NORTHGRID-LANCS-HEP']},
>
> 'UKI-SCOTGRID-GLASGOW':{'mysquid':'http://nat005.gla.scotgrid.ac.uk:3128','frontiers':[fkit,fpic],'squids':['local','UKI-NORTHGRID-LANCS-HEP']},
>
> 'UKI-NORTHGRID-LANCS-HEP':{'mysquid':'http://fal-pygrid-45.lancs.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SCOTGRID-GLASGOW']},
>
> 'UKI-NORTHGRID-LIV-HEP':{'mysquid':'http://atlcache.ph.liv.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-SHEF-HEP']},
>
> 'UKI-NORTHGRID-SHEF-HEP':{'mysquid':'http://lcgsquid.shef.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-LIV-HEP']},
>
> 'UKI-NORTHGRID-MAN-HEP':{'mysquid':'http://squid-cache.tier2.hep.manchester.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-LANCS-HEP']},
>
> 'UKI-SOUTHGRID-BHAM-HEP':{'mysquid':'http://epgr01.ph.bham.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>
> 'UKI-SOUTHGRID-CAM-HEP':{'mysquid':'http://ce.hep.phy.cam.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>
> 'UKI-SOUTHGRID-OX-HEP':{'mysquid':'http://t2squid01.physics.ox.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-RALPP']},
>
> 'UKI-SOUTHGRID-RALPP':{'mysquid':'http://atlassquid.pp.rl.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-HEP']},
>
> 'UKI-LT2-QMUL':{'mysquid':'http://frontiercache.esc.qmul.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-RHUL']},
>
> 'UKI-LT2-RHUL'
> :{'mysquid':'http://squid1.ppgrid1.rhul.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
>
> #
> 'UKI-LT2-UCL-CENTRAL':{'mysquid':'','frontiers':[fral,fpic],'squids':['UKI-LT2-UCL-HEP','UKI-LT2-QMUL']},
>
> 'UKI-LT2-UCL-HEP':{'mysquid':'http://squid.hep.ucl.ac.uk:3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
>
>
> So according to that you are backup for Cambridge, RALPP and
> Birmingham. If those sites would like to keep you as their backup they
> need to inform you of where their WN are so you can make an exception in
> your firewall.
>
> The twiki page does have Frontier backup but not T2 backups. I could
> add it but it doesn't make it that much easier to find.
> https://twiki.cern.ch/twiki/bin/view/Atlas/T2SquidDeployment
>
> I know its all a horrible mess at the moment.
>
> Alastair
>
>
> On 17 Jun 2010, at 14:46, Peter Gronbech wrote:
>
>> Works from Oxfords WNs.
>>
>> Can you tell me where the I can see a list of which sites Oxford is
>> supposed to be the backup for?
>>
>> Thanks Pete
>>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]] On Behalf Of Alastair Dewhurst
>> Sent: 17 June 2010 11:50
>> To: [log in to unmask]
>> Subject: Re: ATLAS Tier 2 squid failover
>>
>> Hi All
>>
>> After a discussion in today Thursday phone meeting we have decided
>> the following:
>>
>> 1) If you have been passing the SAM tests and are happy with your
>> current setup then no changes will be made to effect your site.
>> 2) If you have been failing (getting a warning) on the SAM tests I
>> will switch you over to having the RAL Tier 1 as your primary backup.
>> 3) If you would prefer to have RAL as your primary backup which will
>> allow things to be more easily monitored from the Tier 1 then I will
>> switch you over too.
>>
>> I would appreciate it if all sites, even if they don't want anything
>> changed did run the test as it does prove that direct access works
>> (incase of emergency).
>>
>> Site : Test (Who ran it) : SAM :
>> Preference
>> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
>> Liverpool : Passed (Stephen Jones) : ok : unknown
>> QMUL : Passed (Chris Walker) : ok : unknown
>> Cambridge : Passed (Santanu Das) : warn : Will be changed
>> Sheffield : Passed (Elena Korolkova) : ok : unknown
>> RHUL : Passed (Simon George) : ok : unknown
>> UCL : Passed (Ben Waugh) : ok : unknown
>> Manchester: Not run : ok :
>> Stay the same
>> Lancaster : Not run : ok :
>> Stay the same
>> Oxford : Not run : warn :
>> Will be changed
>> Birmingham: Not run : warn : Will
>> be changed
>> Glasgow : Not run : ok :
>> unknown, although still uses FZK which Graeme Stewart said should be
>> changed.
>>
>> I am still trying to sort out some new monitoring for the Tier 1 and
>> I will send out a confirmation before submitting any request to
>> change Tiersofatlas. If anyone has any additional suggestions
>> regarding monitoring and chasing up failures that is very welcome.
>> As was said at the meeting, this is a setup that seems to work very
>> well most of the time, it is really a question of how best to chase
>> up the few problems when they occur without creating lots of work for
>> ourselves.
>>
>> Thanks
>>
>> Alastair
>>
>>
>> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>>
>>> This works for UCL (both HEP and Legion clusters).
>>>
>>> Cheers,
>>> Ben
>>>
>>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>>> Hi Santanu
>>>> Thank you for spotting that, it should indeed be a capital F. I
>>>> thought I had copied and pasted the commands directly but maybe my
>>>> mail client decided to do some formatting. That should fix most
>>>> of the problems as the Frontier server/squid should be accessible
>>>> to all.
>>>> If we were to make this change, it would not make RAL a single
>>>> point of failure. In order for their to be a failure both your
>>>> own squid and RAL would have to fail If RAL fails your own squid
>>>> should be set up to access PIC. The current situation means that
>>>> if you and your back squid fail, things will break. (If both RAL
>>>> and PIC are down then you will also fail under both systems but
>>>> multiple T1 failures should hopefully be rare!)
>>>> Alastair
>>>> So the new instructions are:
>>>> Log into a WN
>>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>> frontierATLAS/Frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>> This should provide a big list of table names and not a python error!
>>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>>> Hi Alastair and all,
>>>>>
>>>>> I think there is typo in the URL, it should be "http://lcgft-
>>>>> atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not*
>>>>> "frontier" with small f. Now it works for with or without a
>>>>> http_proxy setting.
>>>>>
>>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python
>>>>> fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>> frontierATLAS/Frontier --sql="SELECT count(*) FROM ALL_TABLES"
>>>>> Using Frontier URL: http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>> frontierATLAS/Frontier Query: SELECT count(*) FROM ALL_TABLES
>>>>> Decode results: True Refresh cache: False Frontier Request:
>>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?
>>>>> type=frontier_request:
>>>>> 1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3
>>
>>>>> F08nENBgCQ9wjs Query started: 06/16/10 11:44:15 BST Query ended:
>>>>> 06/16/10 11:44:16 BST Query time: 1.34605288506 [seconds] Query
>>>>> result: <?xml version="1.0" encoding="US-ASCII"?> <!DOCTYPE
>>>>> frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
>>>>> <frontier version="3.22" xmlversion="1.0"> <transaction
>>>>> payloads="1"> <payload type="frontier_request" version="1"
>>>>> encoding="BLOBzip"> <data>eJxjY2Bg4HD2D/
>>>>> UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality error="0"
>>>>> md5="3c31cc5665b2636e8feb209fafa558f6" records="1" full_size="35"/
>>>>>> </payload> </transaction> </frontier> Fields: COUNT(*) NUMBER
>>>>> Records: 8833 Cheers,
>>>>> Santanu
>>>>>
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> The current state of the ATLAS frontier service is not ideal.
>>>>>> The SAM tests:
>>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-
>>>>>> ATLAS-sft-Frontier-
>>>>>> Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_stat
>>
>>>>>> us=ok&disp_status=info&disp_status=note&disp_status=warn&disp_statu
>>
>>>>>> s=error&disp_status=crit&disp_status=maint
>>>>>> show several production sites getting a warning. This warning
>>>>>> is normally caused by the backup squid not being configured
>>>>>> correctly.
>>>>>>
>>>>>> To remind people: WNs should connect to the local squid
>>>>>> (normally at the site) which connects to the Frontier server at
>>>>>> RAL. If the local squid is down then the WN will try and
>>>>>> connect to a backup squid which is meant to be at a nearby site
>>>>>> which will then try and connect to the Frontier server. There
>>>>>> is a similar backup process should the Frontier server at RAL
>>>>>> fail then all the squids will try and connect to the frontier
>>>>>> server at PIC.
>>>>>>
>>>>>> To ease this problem it has been suggested that the default
>>>>>> backup for Tier 2 sites is the squid at RAL (The Tier 1 not the
>>>>>> Tier 2!). The squid at the Tier 1 is the same installation as
>>>>>> the Frontier server so if the frontier services goes down so
>>>>>> will the backup squid. This does reduce the resilience of the
>>>>>> setup slightly but I think this is worth it given it should make
>>>>>> things significantly simpler to maintain. It does also means I
>>>>>> will have to get the SAM test modified slightly. If however
>>>>>> there are sites that are happy with the current setup and
>>>>>> managing firewall access to their squid from other sites worker
>>>>>> nodes then please feel free to respond.
>>>>>>
>>>>>> Before committing any change to Tiersofatlas I would like sites
>>>>>> to run a test to make sure they can indeed successfully access
>>>>>> the RAL squid.
>>>>>>
>>>>>> To do this:
>>>>>> Log into a WN
>>>>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>>> frontierATLAS/frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>>> This should provide a big list of table names and not a python
>>>>>> error!
>>>>>>
>>>>>> Could sites please reply with the results of the test and any
>>>>>> comments are also welcome.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Alastair
>>>>>
>>>
>>> --
>>> Dr Ben Waugh Tel. +44 (0)20 7679
>>> 7223
>>> Dept of Physics and Astronomy Internal: 37223
>>> University College London
>>> London WC1E 6BT
--
Dr Ben Waugh Tel. +44 (0)20 7679 7223
Dept of Physics and Astronomy Internal: 37223
University College London
London WC1E 6BT
|