We tried:
Reinstall RB - no.
Reboot RB - no.
Various scripts that prod things - no.
Reboot BDII - no
Restarting various RB services one-at-a-time - no.
Neglect - yes!
Indeed, it appears that neglect has its part to play here. Our RB is
currently working fine having been left to its own devices. It mysteriously
started working at about 02:18 this morning and has continued to function
since.
The problem seemed to be an unknown issue between the Workload Manager and
the Job Controller - jobs would not be handed off to the Job Controller
which seemed to be spending its time running in circles whinging about
various things/jobs/events it considered `bad'.
Having left it to its own devices, this situation appears to have cleared
itself as the last of the things/jobs/events it considered `bad' was flushed
(by some timeout?). Various other of the RB processes are still whining
about a variety of things including `error recovering event store:
/var/tmp/dg20logd_.NNNNNNN: ... error getting events jobid'
but we appear to be in business.
Therefore, another tool to add to the toolbox: ignore the problem and see
if it goes away.
Have a happy
#include <seasonal_celebration_of_your_choice.h>
Martin.
|