On 06/01/2012 01:42 PM, Matt Doidge wrote:
> Hello again,
>
>> I think this is more or less what was happening, the multicore queue
>> jobs get on top, run out of space so can't be started and then maui gets
>> all lazy with the scheduling of the "stuck" jobs. After increasing my
>> RESERVATIONDEPTH and setting MAXIJOBS=1 for the multicore queue
>> scheduling seems to be working as expected for the first time. Whether
>> this is due to these changes or some other factor (maybe my tears
>> soaking into my keyboard invoked mercy from the dark gods of cluster
>> computing?). There are still many improvements I want to try (liek
>> Stuart's suggestion at partitioning my nodes, but at least now I have a
>> baseline that works!
>
> Well I kind of spoke too soon yesterday - after filling the queues
> just long enough for me to get my hopes up maui then started playing
> silly buggers again. It's behaving better, keeping things 80%-90% full
> (with peaks and troughs in free job slot utilisation) , but that's
> still too much waste.
>
> Once more unto maui.cfg I go (I hope to implement some of Stuart's
> priority and weighting suggestions), any further advice would of
> course be appreciated!
>
> Have a good weekend all!
> Matt
Matt,
If it's what I think it is (see before) then sadly we found no fix! Rob
knows more, and can say next week.
Maybe all you can do is an hourly run with a perl script that sees if
you have (say) 10 empty slots and a long queue of jobs, with a loop of
qrun's to set them off!
I know it's rubbish, but anything to keep the show on the road. We
always intended to patch maui but the requirement went away at some
point, so we never did. If we have to run whole-node-jobs, we should
remove this block by hook or by crook.
Steve
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|