mmmhh I don't think that is the problem. If a badly written parsing
routing can disrupt maui maybe we should change scheduler.
I haven't had time to look at it but I have the feeling it is connected
to the number of requests users send to the
pbs /maui server. I'm dumping my IS every minute and the problem is not
detected by gstat because it has a larger interval, but 4444 values
appear all the time. Sometimes for long periods but sometimes for only 1
or 2 minutes.
cheers
alessandra
Peter Love wrote:
> I made these changes too, looks like there is no improvement. However,
> the gip changes may be something to look at (Mona's thread on Rollout).
> I'll wait until the new script matures from alpha to beta :-)
>
> Peter
>
> Alessandra Forti ([log in to unmask]) wrote:
>
>> Hello,
>>
>> this might help those who are publisihng randomly 4444 peaks of waiting
>> jobs because maui is not responding. I've just modified my maui cfg so I
>> hope it'll work.
>>
>> cheers
>> alessandra
>>
>> --
>> *******************************************
>> * Dr Alessandra Forti *
>> * Technical Coordinator - NorthGrid Tier2 *
>> * http://www.hep.man.ac.uk/u/aforti *
>> *******************************************
>>
>>
>
>
>> Date: Thu, 27 Apr 2006 17:42:09 +0200
>> From: Jeff Templon <[log in to unmask]>
>> Reply-To: LHC Computer Grid - Rollout <[log in to unmask]>
>> To: [log in to unmask]
>> Subject: [LCG-ROLLOUT] on Maui freakouts
>> User-Agent: Thunderbird 1.5.0.2 (Macintosh/20060308)
>>
>> Hi *
>>
>> I know many of us are having periodic Maui freakouts, where for some
>> reason Maui hangs for awhile or just dies; this phenomenon is made clear
>> now since it is reflected in the published ERT.
>>
>> This has been really irritating for me the last few days, as I have been
>> trying to get fair sharing in shape on our cluster, and of course a
>> catatonic Maui doesn't really help things.
>>
>> I did some googling, perhaps more agressively than usual, and came up
>> with the following link:
>>
>> http://www.clusterresources.com/pipermail/mauiusers/2005-August/001669.html
>>
>> the error they talk about here:
>>
>> ERROR: cannot get node info: NULL
>>
>> is often what I see right before Maui hangs for 15 minutes. They claim
>> that the problem is a disagreement between Maui and Torque about certain
>> timing issues, which is plausible given what I see.
>>
>> I set the following two things in maui.cfg:
>>
>> RMCFG[0] TIMEOUT=90
>> NODESYNCTIME 0:00:30
>>
>> and it's too early to be conclusive, but I haven't seen the Maui lockup
>> since doing so and restarting maui. Will keep you posted, interested
>> whether this solves others' problems as well!!
>>
>> J "no present like the time" T
>>
--
*******************************************
* Dr Alessandra Forti *
* Technical Coordinator - NorthGrid Tier2 *
* http://www.hep.man.ac.uk/u/aforti *
*******************************************
|