Hi Alastair,
the test works on Manchester WNs.
cheers
alessandra
Alastair Dewhurst wrote:
> Hi All
>
> After a discussion in today Thursday phone meeting we have decided the
> following:
>
> 1) If you have been passing the SAM tests and are happy with your
> current setup then no changes will be made to effect your site.
> 2) If you have been failing (getting a warning) on the SAM tests I
> will switch you over to having the RAL Tier 1 as your primary backup.
> 3) If you would prefer to have RAL as your primary backup which will
> allow things to be more easily monitored from the Tier 1 then I will
> switch you over too.
>
> I would appreciate it if all sites, even if they don't want anything
> changed did run the test as it does prove that direct access works
> (incase of emergency).
>
> Site : Test (Who ran it) : SAM : Preference
> RAL PP : Passed (Alastair Dewhurst) : ok : Use Tier 1
> Liverpool : Passed (Stephen Jones) : ok : unknown
> QMUL : Passed (Chris Walker) : ok : unknown
> Cambridge : Passed (Santanu Das) : warn : Will be changed
> Sheffield : Passed (Elena Korolkova) : ok : unknown
> RHUL : Passed (Simon George) : ok : unknown
> UCL : Passed (Ben Waugh) : ok : unknown
> Manchester: Not run : ok : Stay the same
> Lancaster : Not run : ok : Stay the same
> Oxford : Not run : warn : Will be changed
> Birmingham: Not run : warn : Will be changed
> Glasgow : Not run : ok : unknown, although still uses FZK which Graeme
> Stewart said should be changed.
>
> I am still trying to sort out some new monitoring for the Tier 1 and I
> will send out a confirmation before submitting any request to change
> Tiersofatlas. If anyone has any additional suggestions regarding
> monitoring and chasing up failures that is very welcome. As was said
> at the meeting, this is a setup that seems to work very well most of
> the time, it is really a question of how best to chase up the few
> problems when they occur without creating lots of work for ourselves.
>
> Thanks
>
> Alastair
>
>
> On 17 Jun 2010, at 10:22, Ben Waugh wrote:
>
>> This works for UCL (both HEP and Legion clusters).
>>
>> Cheers,
>> Ben
>>
>> On 16/06/10 12:14, Alastair Dewhurst wrote:
>>> Hi Santanu
>>> Thank you for spotting that, it should indeed be a capital F. I
>>> thought I had copied and pasted the commands directly but maybe my
>>> mail client decided to do some formatting. That should fix most of
>>> the problems as the Frontier server/squid should be accessible to all.
>>> If we were to make this change, it would not make RAL a single point
>>> of failure. In order for their to be a failure both your own squid
>>> and RAL would have to fail If RAL fails your own squid should be set
>>> up to access PIC. The current situation means that if you and your
>>> back squid fail, things will break. (If both RAL and PIC are down
>>> then you will also fail under both systems but multiple T1 failures
>>> should hopefully be rare!)
>>> Alastair
>>> So the new instructions are:
>>> Log into a WN
>>> > wget http://frontier.cern.ch/dist/fnget.py
>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>> > python fnget.py
>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>> This should provide a big list of table names and not a python error!
>>> On 16 Jun 2010, at 11:51, Santanu Das wrote:
>>>> Hi Alastair and all,
>>>>
>>>> I think there is typo in the URL, it should be
>>>> "http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier"
>>>> *not* "frontier" with small f. Now it works for with or without a
>>>> http_proxy setting.
>>>>
>>>> [root@farm002 tmp]# unset http_proxy [root@farm002 tmp]# python
>>>> fnget.py
>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>> --sql="SELECT count(*) FROM ALL_TABLES" Using Frontier URL:
>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier
>>>> Query: SELECT count(*) FROM ALL_TABLES Decode results: True Refresh
>>>> cache: False Frontier Request:
>>>> http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNoLdvVxdQ5RSM4vzSvR0NJUcAvy91Vw9PGJD3F08nENBgCQ9wjs
>>>> Query started: 06/16/10 11:44:15 BST Query ended: 06/16/10 11:44:16
>>>> BST Query time: 1.34605288506 [seconds] Query result: <?xml
>>>> version="1.0" encoding="US-ASCII"?> <!DOCTYPE frontier SYSTEM
>>>> "http://frontier.fnal.gov/frontier.dtd"> <frontier version="3.22"
>>>> xmlversion="1.0"> <transaction payloads="1"> <payload
>>>> type="frontier_request" version="1" encoding="BLOBzip">
>>>> <data>eJxjY2Bg4HD2D/UL0dDSZANy2PxCfZ1cg9hBbBYLC2NjdgBW1ATW</data>
>>>> <quality error="0" md5="3c31cc5665b2636e8feb209fafa558f6"
>>>> records="1" full_size="35"/> </payload> </transaction> </frontier>
>>>> Fields: COUNT(*) NUMBER Records: 8833 Cheers,
>>>> Santanu
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> The current state of the ATLAS frontier service is not ideal. The
>>>>> SAM tests:
>>>>> https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-sft-Frontier-Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint
>>>>>
>>>>> show several production sites getting a warning. This warning is
>>>>> normally caused by the backup squid not being configured correctly.
>>>>>
>>>>> To remind people: WNs should connect to the local squid (normally
>>>>> at the site) which connects to the Frontier server at RAL. If the
>>>>> local squid is down then the WN will try and connect to a backup
>>>>> squid which is meant to be at a nearby site which will then try
>>>>> and connect to the Frontier server. There is a similar backup
>>>>> process should the Frontier server at RAL fail then all the squids
>>>>> will try and connect to the frontier server at PIC.
>>>>>
>>>>> To ease this problem it has been suggested that the default backup
>>>>> for Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!).
>>>>> The squid at the Tier 1 is the same installation as the Frontier
>>>>> server so if the frontier services goes down so will the backup
>>>>> squid. This does reduce the resilience of the setup slightly but I
>>>>> think this is worth it given it should make things significantly
>>>>> simpler to maintain. It does also means I will have to get the SAM
>>>>> test modified slightly. If however there are sites that are happy
>>>>> with the current setup and managing firewall access to their squid
>>>>> from other sites worker nodes then please feel free to respond.
>>>>>
>>>>> Before committing any change to Tiersofatlas I would like sites to
>>>>> run a test to make sure they can indeed successfully access the
>>>>> RAL squid.
>>>>>
>>>>> To do this:
>>>>> Log into a WN
>>>>> > wget http://frontier.cern.ch/dist/fnget.py
>>>>> > export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
>>>>> > python fnget.py
>>>>> --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/frontierATLAS/frontier
>>>>> --sql="SELECT TABLE_NAME FROM ALL_TABLES"
>>>>> This should provide a big list of table names and not a python error!
>>>>>
>>>>> Could sites please reply with the results of the test and any
>>>>> comments are also welcome.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Alastair
>>>>
>>
>> --
>> Dr Ben Waugh Tel. +44 (0)20 7679 7223
>> Dept of Physics and Astronomy Internal: 37223
>> University College London
>> London WC1E 6BT
--
The most effective way to do it, is to do it. (Amelia Earhart)
Northgrid Tier2 Technical Coordinator
http://www.hep.manchester.ac.uk/computing/tier2
|