Hi Alastair,
at L'pool, we get 404.
Steve
Alastair Dewhurst wrote:
> Hi
>
> The python error observed means that it can't connect. Which in
> general indicates that I haven't set something quite right at RAL. I
> should stress I won't do anything officially until I am sure it will
> work. I will have a play around and see that everything is set up
> correctly.
>
> To go back to the discussion about why make this change, I will post
> an extract from a discussion I have been having with the experts.
> These are therefore the arguments for having everyone fail over to RAL:
>
>> ...
>>>> The solution of having all UK Tier 2s failing over to RAL Tier 1
>>>> seems like a sensible solution as it simplifies things for sites.
>>>> All the problems we have seen recently have been with configuration
>>>> issues with the many layers of failovers rather than with sites
>>>> actually failing. I was wondering what the opinions of the experts
>>>> were regarding this?
>>>
>>> I think we're all (or at least mostly) or the mindset that T2 sites
>>> should fail-over to their local T1 site, and that T1s should therefore
>>> permit source traffic from anywhere. As you say, this simplifies the
>>> ACL configuration at T1 and reduces the chance that a T2 worker node
>>> will be rejected. Also, site access control can be done at the
>>> destination level, so that T1 resources are not open to attack or
>>> exploit. Oracle resources are already open to the world in a similar
>>> manner.
>>
>> I'm of the strong opinion that T2 squid proxies shouldn't fail over to
>> other T2s, because of the difficulty in administration of permissions
>> and because of the need to then over-engineer every T2 site to handle
>> the full load of other sites. CMS has only one server site, at CERN,
>> and all T1s, T2s, and T3s fail over to it. We make that one site
>> have lots of extra capacity and watch it carefully to ensure that
>> failures do not persist for long.
>
>
>
> Myself and Catalin at the Tier 1 are attempting to at least be
> familiar with most aspects of the Frontier service and we do attend
> the meetings have contact with the developers. We can monitor the
> service at Tier 1 more effectively than at several Tier 2 sites. So
> far the ATLAS Frontier service in the UK has not really been managed
> very well and its only thanks to Tiers 2 doing a very good job (and
> possibly a bit of luck) of catching what ATLAS is saying that we have
> had relatively few problems. In future I will also try and make sure
> any requests for patches/upgrades etc get sent to this mail list.
>
> Alastair
>
>
>
> On 16 Jun 2010, at 09:42, Ben Waugh wrote:
>
>> So does the 404 error mean the configuration error is with the RAL
>> Squid rather than the local site? I don't mind which backup site we
>> use as long as it works!
>>
>> Cheers,
>> Ben
>>
>> On 16/06/10 09:18, Christopher J.Walker wrote:
>>> Alastair Dewhurst wrote:
>>>> Hi
>>>>
>>>> The current state of the ATLAS frontier service is not ideal. The SAM
>>>> tests:
>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint
>>>>
>>>>
>>>> show several production sites getting a warning. This warning is
>>>> normally caused by the backup squid not being configured correctly.
>>>>
>>>
>>> QMUL (and RHUL) are warning because RHUL hasn't got around to
>>> configuring a squid after their shutdown.
>>>
>>> It's true that QMUL's squid wasn't working for them after I upgraded it
>>> - at Atlas's request. It's just a simple upgrade and will just work
>>> apparently - though I do wish I'd been told it moved the config
>>> files...
>>>
>>> In fact, that reminds me, which mailing list should I have been on to
>>> get told about that request. A concern I have about installing this
>>> sort
>>> of one off software is that it doesn't get the routine security updates
>>> that SL does.
>>>
>>>> To remind people: WNs should connect to the local squid (normally at
>>>> the site) which connects to the Frontier server at RAL. If the local
>>>> squid is down then the WN will try and connect to a backup squid which
>>>> is meant to be at a nearby site which will then try and connect to the
>>>> Frontier server. There is a similar backup process should the Frontier
>>>> server at RAL fail then all the squids will try and connect to the
>>>> frontier server at PIC.
>>>>
>>>> To ease this problem it has been suggested that the default backup for
>>>> Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!). The
>>>> squid at the Tier 1 is the same installation as the Frontier server so
>>>> if the frontier services goes down so will the backup squid. This does
>>>> reduce the resilience of the setup slightly but I think this is worth
>>>> it given it should make things significantly simpler to maintain. It
>>>> does also means I will have to get the SAM test modified slightly. If
>>>> however there are sites that are happy with the current setup and
>>>> managing firewall access to their squid from other sites worker nodes
>>>> then please feel free to respond.
>>>>
>>>> Before committing any change to Tiersofatlas I would like sites to run
>>>> a test to make sure they can indeed successfully access the RAL squid.
>>>>
>>>> To do this:
>>>> Log into a WN
>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>> > python fnget.py
>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>> This should provide a big list of table names and not a python error!
>>>>
>>>
>>> You mean not like this...
>>>
>>> [walker@cn456 tmp]$ wget http://frontier.cern.ch/dist/fnget.py
>>> --2010-06-16 09:07:11-- http://frontier.cern.ch/dist/fnget.py
>>> Resolving frontier.cern.ch... 128.142.202.212
>>> Connecting to frontier.cern.ch|128.142.202.212|:80... connected.
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 8406 (8.2K) [text/plain]
>>> Saving to: `fnget.py'
>>>
>>> 100%[======================================>] 8,406 --.-K/s in 0.02s
>>>
>>> 2010-06-16 09:07:11 (434 KB/s) - `fnget.py' saved [8406/8406]
>>>
>>> [walker@cn456 tmp]$ export
>>> http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>> [walker@cn456 tmp]$ python fnget.py
>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>> Using Frontier URL:
>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>> Query: SELECT TABLE_NAME FROM ALL_TABLES
>>> Decode results: True
>>> Refresh cache: False
>>>
>>> Frontier Request:
>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RCHF08nGN93P0dVVwC-L3VXD08YkHiwUDAJs3CTA_
>>>
>>>
>>>
>>> Query started: 06/16/10 09:07:25 BST
>>> Traceback (most recent call last):
>>> File "fnget.py", line 231, in ?
>>> result = urllib2.urlopen(request).read()
>>> File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen
>>> return _opener.open(url, data)
>>> File "/usr/lib64/python2.4/urllib2.py", line 364, in open
>>> response = meth(req, response)
>>> File "/usr/lib64/python2.4/urllib2.py", line 471, in http_response
>>> response = self.parent.error(
>>> File "/usr/lib64/python2.4/urllib2.py", line 402, in error
>>> return self._call_chain(*args)
>>> File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain
>>> result = func(*args)
>>> File "/usr/lib64/python2.4/urllib2.py", line 480, in http_error_default
>>> raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
>>> urllib2.HTTPError: HTTP Error 404: Not Found
>>>
>>>> Could sites please reply with the results of the test and any comments
>>>> are also welcome.
>>>>
>>>>
>>
>> --Dr Ben Waugh Tel. +44 (0)20 7679 7223
>> Dept of Physics and Astronomy Internal: 37223
>> University College London
>> London WC1E 6BT
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|