Hi Pete
Currently, as far as I am aware there is only TiersofATLAS where it
is recorded:
http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/
TiersOfATLASCache.py
The relevant part is here:
# UK
'RAL-LCG2':{'mysquid':'','frontiers':[fral,fpic],'squids':[]},
'UKI-SCOTGRID-ECDF':{'mysquid':'','frontiers':[fral,fpic],'squids':
['UKI-SCOTGRID-GLASGOW','UKI-NORTHGRID-LANCS-HEP']},
'UKI-SCOTGRID-GLASGOW':{'mysquid':'http://nat005.gla.scotgrid.ac.uk:
3128','frontiers':[fkit,fpic],'squids':['local','UKI-NORTHGRID-LANCS-
HEP']},
'UKI-NORTHGRID-LANCS-HEP':{'mysquid':'http://fal-
pygrid-45.lancs.ac.uk:3128','frontiers':[fral,fpic],'squids':
['local','UKI-SCOTGRID-GLASGOW']},
'UKI-NORTHGRID-LIV-HEP':{'mysquid':'http://atlcache.ph.liv.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-SHEF-
HEP']},
'UKI-NORTHGRID-SHEF-HEP':{'mysquid':'http://lcgsquid.shef.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-NORTHGRID-LIV-
HEP']},
'UKI-NORTHGRID-MAN-HEP':{'mysquid':'http://squid-
cache.tier2.hep.manchester.ac.uk:3128','frontiers':
[fral,fpic],'squids':['local','UKI-NORTHGRID-LANCS-HEP']},
'UKI-SOUTHGRID-BHAM-HEP':{'mysquid':'http://epgr01.ph.bham.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-
HEP']},
'UKI-SOUTHGRID-CAM-HEP':{'mysquid':'http://ce.hep.phy.cam.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-
HEP']},
'UKI-SOUTHGRID-OX-HEP':{'mysquid':'http://
t2squid01.physics.ox.ac.uk:3128','frontiers':[fral,fpic],'squids':
['local','UKI-SOUTHGRID-RALPP']},
'UKI-SOUTHGRID-RALPP':{'mysquid':'http://atlassquid.pp.rl.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-SOUTHGRID-OX-
HEP']},
'UKI-LT2-QMUL':{'mysquid':'http://frontiercache.esc.qmul.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-RHUL']},
'UKI-LT2-RHUL' :{'mysquid':'http://squid1.ppgrid1.rhul.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
# 'UKI-LT2-UCL-CENTRAL':{'mysquid':'','frontiers':
[fral,fpic],'squids':['UKI-LT2-UCL-HEP','UKI-LT2-QMUL']},
'UKI-LT2-UCL-HEP':{'mysquid':'http://squid.hep.ucl.ac.uk:
3128','frontiers':[fral,fpic],'squids':['local','UKI-LT2-QMUL']},
So according to that you are backup for Cambridge, RALPP and
Birmingham. If those sites would like to keep you as their backup
they need to inform you of where their WN are so you can make an
exception in your firewall.
The twiki page does have Frontier backup but not T2 backups. I could
add it but it doesn't make it that much easier to find.
https://twiki.cern.ch/twiki/bin/view/Atlas/T2SquidDeployment
I know its all a horrible mess at the moment.
Alastair
On 17 Jun 2010, at 14:46, Peter Gronbech wrote:
> Works from Oxfords WNs.
>
> Can you tell me where the I can see a list of which sites Oxford is
> supposed to be the backup for?
>
> Thanks Pete
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Alastair Dewhurst
> Sent: 17 June 2010 11:50
> To: [log in to unmask]
> Subject: Re: ATLAS Tier 2 squid failover
>
> Hi All
>
> After a discussion in today Thursday phone meeting we have decided
> the following:
>
> 1) If you have been passing the SAM tests and are happy with your
> current setup then no changes will be made to effect your site.
> 2) If you have been failing (getting a warning) on the SAM tests I
> will switch you over to having the RAL Tier 1 as your primary backup.
> 3) If you would prefer to have RAL as your primary backup which will
> allow things to be more easily monitored from the Tier 1 then I will
> switch you over too.
>
> I would appreciate it if all sites, even if they don't want anything
> changed did run the test as it does prove that direct access works
> (incase of emergency).
>
> Site : Test (Who ran it) : SAM :
> Preference
> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
> Liverpool : Passed (Stephen Jones) : ok : unknown
> QMUL : Passed (Chris Walker) : ok : unknown
> Cambridge : Passed (Santanu Das) : warn : Will be changed
> Sheffield : Passed (Elena Korolkova) : ok : unknown
> RHUL : Passed (Simon George) : ok : unknown
> UCL : Passed (Ben Waugh) : ok : unknown
> Manchester: Not run : ok :
> Stay the same
> Lancaster : Not run : ok :
> Stay the same
> Oxford : Not run : warn :
> Will be changed
> Birmingham: Not run : warn : Will
> be changed
> Glasgow : Not run : ok :
> unknown, although still uses FZK which Graeme Stewart said should be
> changed.
>
> I am still trying to sort out some new monitoring for the Tier 1 and
> I will send out a confirmation before submitting any request to
> change Tiersofatlas. If anyone has any additional suggestions
> regarding monitoring and chasing up failures that is very welcome.
> As was said at the meeting, this is a setup that seems to work very
> well most of the time, it is really a question of how best to chase
> up the few problems when they occur without creating lots of work for
> ourselves.
>
> Thanks
>
> Alastair
>
>
> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>
>> This works for UCL (both HEP and Legion clusters).
>>
>> Cheers,
>> Ben
>>
>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>> Hi Santanu
>>> Thank you for spotting that, it should indeed be a capital F. I
>>> thought I had copied and pasted the commands directly but maybe my
>>> mail client decided to do some formatting. That should fix most
>>> of the problems as the Frontier server/squid should be accessible
>>> to all.
>>> If we were to make this change, it would not make RAL a single
>>> point of failure. In order for their to be a failure both your
>>> own squid and RAL would have to fail If RAL fails your own squid
>>> should be set up to access PIC. The current situation means that
>>> if you and your back squid fail, things will break. (If both RAL
>>> and PIC are down then you will also fail under both systems but
>>> multiple T1 failures should hopefully be rare!)
>>> Alastair
>>> So the new instructions are:
>>> Log into a WN
>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>> frontierATLAS/Frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>> This should provide a big list of table names and not a python
>>> error!
>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>> Hi Alastair and all,
>>>>
>>>> I think there is typo in the URL, it should be "http://lcgft-
>>>> atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier" *not*
>>>> "frontier" with small f. Now it works for with or without a
>>>> http_proxy setting.
>>>>
>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python
>>>> fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>> frontierATLAS/Frontier --sql="SELECT count(*) FROM ALL_TABLES"
>>>> Using Frontier URL: http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>> frontierATLAS/Frontier Query: SELECT count(*) FROM ALL_TABLES
>>>> Decode results: True Refresh cache: False Frontier Request:
>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?
>>>> type=frontier_request:
>>>> 1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD
>>>> 3
>
>>>> F08nENBgCQ9wjs Query started: 06/16/10 11:44:15 BST Query ended:
>>>> 06/16/10 11:44:16 BST Query time: 1.34605288506 [seconds] Query
>>>> result: <?xml version="1.0" encoding="US-ASCII"?> <!DOCTYPE
>>>> frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
>>>> <frontier version="3.22" xmlversion="1.0"> <transaction
>>>> payloads="1"> <payload type="frontier_request" version="1"
>>>> encoding="BLOBzip"> <data>eJxjY2Bg4HD2D/
>>>> UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data> <quality error="0"
>>>> md5="3c31cc5665b2636e8feb209fafa558f6" records="1" full_size="35"/
>>>>> </payload> </transaction> </frontier> Fields: COUNT(*) NUMBER
>>>> Records: 8833 Cheers,
>>>> Santanu
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> The current state of the ATLAS frontier service is not ideal.
>>>>> The SAM tests:
>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-
>>>>> ATLAS-sft-Frontier-
>>>>> Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_sta
>>>>> t
>
>>>>> us=ok&disp_status=info&disp_status=note&disp_status=warn&disp_stat
>>>>> u
>
>>>>> s=error&disp_status=crit&disp_status=maint
>>>>> show several production sites getting a warning. This warning
>>>>> is normally caused by the backup squid not being configured
>>>>> correctly.
>>>>>
>>>>> To remind people: WNs should connect to the local squid
>>>>> (normally at the site) which connects to the Frontier server at
>>>>> RAL. If the local squid is down then the WN will try and
>>>>> connect to a backup squid which is meant to be at a nearby site
>>>>> which will then try and connect to the Frontier server. There
>>>>> is a similar backup process should the Frontier server at RAL
>>>>> fail then all the squids will try and connect to the frontier
>>>>> server at PIC.
>>>>>
>>>>> To ease this problem it has been suggested that the default
>>>>> backup for Tier 2 sites is the squid at RAL (The Tier 1 not the
>>>>> Tier 2!). The squid at the Tier 1 is the same installation as
>>>>> the Frontier server so if the frontier services goes down so
>>>>> will the backup squid. This does reduce the resilience of the
>>>>> setup slightly but I think this is worth it given it should make
>>>>> things significantly simpler to maintain. It does also means I
>>>>> will have to get the SAM test modified slightly. If however
>>>>> there are sites that are happy with the current setup and
>>>>> managing firewall access to their squid from other sites worker
>>>>> nodes then please feel free to respond.
>>>>>
>>>>> Before committing any change to Tiersofatlas I would like sites
>>>>> to run a test to make sure they can indeed successfully access
>>>>> the RAL squid.
>>>>>
>>>>> To do this:
>>>>> Log into a WN
>>>>>> wget http://frontier.cern.ch/dist/fnget.py
>>>>>> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>>> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
>>>>> frontierATLAS/frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>> This should provide a big list of table names and not a python
>>>>> error!
>>>>>
>>>>> Could sites please reply with the results of the test and any
>>>>> comments are also welcome.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Alastair
>>>>
>>
>> --
>> Dr Ben Waugh Tel. +44 (0)20 7679
>> 7223
>> Dept of Physics and Astronomy Internal: 37223
>> University College London
>> London WC1E 6BT
|