Good day,
Since a couple days it's noticed several (many?) UKI ARC-CE are CRITICAL
in gridppnagios with this error:
org.nordugrid.ARC-CE-IGTF-/ops/Role=lcgadmin
IGTF-1.65, -1 days old, all present. - SHA Fingerprint failed for
ca-policy-lcg. - SHA Fingerprint failed for ca-policy-egi-core.
Others with a slightly different message:
IGTF-1.65, 1 day old, all present. - SHA Fingerprint failed for
ca-policy-egi-core.
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/status.cgi?hostgroup=node-ARC-CE&style=overview
1. Is the failure really the ARC-CE? or a WN?
2. In Bristol case the 1.65-1 certs were installed Jun 30 06:07:07 on the CE,
& earlier (3-5am) on the WN.
3. The test started failing 30 June 11:55 at Bristol ARC-CE - several hours
AFTER certificate update. Is the test about certificate version???
4. In at least one interesting case (ce1.dur.scotgrid.ac.uk) (I've not
checked that many), GREEN till 29 Jun 12:17, RED 29 Jun 13:59, then GREEN
again briefly, then RED again!
So is something changed on the ARC-CE in that brief time? Or, the test landed
on a different/working WN, if the failure indicates WN - probably more
likely.
https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/trends.cgi?t1=1435736634&t2=1435823034&host=ce1.dur.scotgrid.ac.uk&service=org.nordugrid.ARC-CE-IGTF-%2Fops%2FRole%3Dlcgadmin&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedhoststate=0&initialassumedservicestate=0&backtrack=4&timeperiod=thisweek&zoom=4
wl / somewhat mystified
|