Hi *
I know many of us are having periodic Maui freakouts, where for some
reason Maui hangs for awhile or just dies; this phenomenon is made clear
now since it is reflected in the published ERT.
This has been really irritating for me the last few days, as I have been
trying to get fair sharing in shape on our cluster, and of course a
catatonic Maui doesn't really help things.
I did some googling, perhaps more agressively than usual, and came up
with the following link:
http://www.clusterresources.com/pipermail/mauiusers/2005-August/001669.html
the error they talk about here:
ERROR: cannot get node info: NULL
is often what I see right before Maui hangs for 15 minutes. They claim
that the problem is a disagreement between Maui and Torque about certain
timing issues, which is plausible given what I see.
I set the following two things in maui.cfg:
RMCFG[0] TIMEOUT=90
NODESYNCTIME 0:00:30
and it's too early to be conclusive, but I haven't seen the Maui lockup
since doing so and restarting maui. Will keep you posted, interested
whether this solves others' problems as well!!
J "no present like the time" T
|