Print

Print


hi all

we're observing random sam failures since yesterday.
for some reason some of our site bdii parameters don't make
it into the central bdii anymore.

we've had three failures and 7 OKs during the night, when the config
was not changed for sure.
  https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=SE&vo=ops&nodename=storage01.lcg.cscs.ch


the SAM test scripts complain about not
being able to find some Glue schema attribute, which is OK on the site
BDII.  for instance, an excerpt from the failing "cr" test at 5am::

  + lcg-cr -v --vo ops file:/home/samops/.same/SE/testFile.txt -l lfn:SE-lcg-cr-storage01.lcg.cscs.ch-1207720006 -d storage01.lcg.cscs.ch
  Using grid catalog type: lfc
  Using grid catalog : prod-lfc-shared-central.cern.ch
  Using LFN : /grid/ops/SAM/SE-lcg-cr-storage01.lcg.cscs.ch-1207720006
  sam-bdii.cern.ch:2170: No GlueSEName found for storage01.lcg.cscs.ch

still, on the site BDII ce01::

  [root@ce01 ~]# ldapsearch -x -H ldap://ce01.lcg.cscs.ch:2170/ -b mds-vo-name=CSCS-LCG2,o=grid | grep GlueSEName
  GlueSEName: [log in to unmask]:SRM
  [root@ce01 ~]# ldapsearch -x -H ldap://storage01.lcg.cscs.ch:2170/ -b mds-vo-name=resource,o=grid | grep GlueSEName
  GlueSEName: [log in to unmask]:SRM


any clues? maybe just one of the many top-level sam-bdii machines at cern is misbehaving?
btw, CMS has observed the same behavior yesterday: some of their tags
did not make it into the top-level bdii but were published correctly in our sBDII
and then of course jobs failed because no resources were found.

peter 


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| Dr. Peter Kunszt
| Head of Distributed High Throughput Computing Unit
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|    /\ \      Swiss National Supercomputing Centre CSCS
|    \/ /      Via Cantonale - Galleria 2
| /\ \  /\ \   6928 Manno
| \/ /  \/ /   Switzerland
|    /\ \
|    \/ /      Tel. +41 91 610 8222  Fax. +41 91 610 8282
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++