Happy New Year,
Earlier this afternoon we clocked up some NAGIOS errors in
org.sam.WN-Rep (among others). I saw similar errors at the 5% level for
about a week before Christmas (roughly 10-18 December) but they then
stopped and all was OK over the holiday period until this lunchtime. I
can see no problems on the site: the SE looks happy, the issue happens
on random WNs, and I've made no changes since the start of December
(other than to perform a rolling upgrade on the WNs). Also when I look
at NAGIOS I see what appear to be superficially similar issues at other
sites - but they don't get penalised in the NAGIOS Availability or
Reliability statistics.
Given that we've been just as busy over the holiday period as before
or after it (so the problem is unlikely to be load-related) I'm not
convinced that the problem is at Cambridge - in which case why are we
the only ones to get punished? On the other hand, if someone has a good
idea where to look to identify the problem, I'd be grateful.
John
|