I vaguely recall hearing that someone had automated the detection of
"black hole" nodes, i.e. worker nodes that have a problem so that jobs
that start on them immediately fail and end. The node therefore sucks in
all quued jobs pretty quickly. I haven't been able to find it with
google or the tbsupport archive. Anyone out there know what I am looking
for?
Thanks,
Simon
|