Hi
The current state of the ATLAS frontier service is not ideal. The
SAM tests:
https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-ATLAS-
sft-Frontier-
Squid&order=SiteName&funct=ShowSensorTests&disp_status=na&disp_status=ok
&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&di
sp_status=crit&disp_status=maint
show several production sites getting a warning. This warning is
normally caused by the backup squid not being configured correctly.
To remind people: WNs should connect to the local squid (normally at
the site) which connects to the Frontier server at RAL. If the local
squid is down then the WN will try and connect to a backup squid
which is meant to be at a nearby site which will then try and connect
to the Frontier server. There is a similar backup process should the
Frontier server at RAL fail then all the squids will try and connect
to the frontier server at PIC.
To ease this problem it has been suggested that the default backup
for Tier 2 sites is the squid at RAL (The Tier 1 not the Tier 2!).
The squid at the Tier 1 is the same installation as the Frontier
server so if the frontier services goes down so will the backup
squid. This does reduce the resilience of the setup slightly but I
think this is worth it given it should make things significantly
simpler to maintain. It does also means I will have to get the SAM
test modified slightly. If however there are sites that are happy
with the current setup and managing firewall access to their squid
from other sites worker nodes then please feel free to respond.
Before committing any change to Tiersofatlas I would like sites to
run a test to make sure they can indeed successfully access the RAL
squid.
To do this:
Log into a WN
> wget http://frontier.cern.ch/dist/fnget.py
> export http_proxy=http://lcgft-atlas.gridpp.rl.ac.uk:3128
> python fnget.py --url=http://lcgft-atlas.gridpp.rl.ac.uk:3128/
frontierATLAS/frontier --sql="SELECT TABLE_NAME FROM ALL_TABLES"
This should provide a big list of table names and not a python error!
Could sites please reply with the results of the test and any
comments are also welcome.
Thanks
Alastair
|