Hi,
At the Tier1 we have been seeing intermittent failures of VO SUM tests against our SRMs. Last night there were a lot of these and we can now see a pattern. These occur for the Atlas, CMS & LHCb tests (Alice don't test our SRM). The failures are correlated and have affected all (I think) UK Tier2 sites as well as us. Looking more widely I see that other Tier1s are not affected but a good number (but not all) Tier2s around the world were also affected. I do not have a detailed pattern - but it did not affect the US.
The failures are not seen from the regional (Oxford) Nagios as far as I can see, and an internal test we were running to try and trap this problem did not error either.
I don't know what the pattern is. At the RAL Tier1 we do not send control traffic (such as these SRM tests) over the OPN link - maybe other Tier1s do. I'll raise this at the WLCG daily operations phone meeting today unless someone else does.
You can see the correlated failures across the UK very clearly for the Atlas tests here:
http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=siteavl&time[]=last24&granularity[]=default&profile=ATLAS_CRITICAL&group=ATLAS_Cloud_UK&site[]=RAL-LCG2&site[]=UKI-LT2-Brunel&site[]=UKI-LT2-IC-HEP&site[]=UKI-LT2-QMUL&site[]=UKI-LT2-RHUL&site[]=UKI-LT2-UCL-HEP&site[]=UKI-NORTHGRID-LANCS-HEP&site[]=UKI-NORTHGRID-LIV-HEP&site[]=UKI-NORTHGRID-MAN-HEP&site[]=UKI-NORTHGRID-SHEF-HEP&site[]=UKI-SCOTGRID-DURHAM&site[]=UKI-SCOTGRID-ECDF&site[]=UKI-SCOTGRID-GLASGOW&site[]=UKI-SOUTHGRID-BHAM-HEP&site[]=UKI-SOUTHGRID-CAM-HEP&site[]=UKI-SOUTHGRID-OX-HEP&site[]=UKI-SOUTHGRID-RALPP&type=quality
Regards
Gareth
--
Scanned by iCritical.
|