PS: Rob Fay, who is off today, knows a lot more than I do about this issue. I'll talk to him when he's back. Steve On 05/31/2012 10:31 AM, Stephen Jones wrote: > Matt, > > this is just a hunch. When you have "runnable" single jobs, are there > unrunnable whole-node-jobs in the queue in front of them? > > Reason for asking: Maui pops from the job queue only until it hits the > first unrunnable (for whatever reason) job . So it never looks deeper > into the queue beyond the first unrunnable job - there may be > runnable jobs in the queue but maui would never reach them. Instead, > it applies some tetris-style "backfilling" algorithm (which is is > broken). > > Anyway, the problem "may" be down to this scenario (it's an idea, > anyway). Say the queue is sorted as follows: W1,W2,W3,S1,S2,S3 (i.e. > three whole-node-jobs, three-single-jobs) and let us say his cluster > has two whole-worker-nodes (WN1,WN2) and two single-worker-nodes > (SN1,SN2). On a scheduler cycle, W1 and W2 would be dispatched to WN1 > and WN2, leaving the queue as W3,S1,S2,S3. Maui cannot schedule the > next job (W3) as no node can take it. Maui does not look deeper into > the job queue as stated above. So, even though S1, S2 and S3 "could" > be scheduled, they are not scheduled. Instead, some broken > "backfilling" algorithm is invoked, that is supposed to "gap fill" the > other jobs. Like I said, it's broken in some way, so I am reliably > told - I don't know how, but it leaves queued jobs just sitting there > even when slots exist to run them. > > Summary: you'll only get the single-jobs to run when there are no > unrunnable whole-node-jobs in front of them. To test, kill the > unrunnable whole-node-jobs in front of the queued single-jobs - you > will then see the single-jobs start. > > Please let me know if this is the issue. I don't know any fix, yet. > I've been looking for a good excuse to fix this issue, by rolling our > own maui. > > Steve > -- Steve Jones [log in to unmask] System Administrator office: 220 High Energy Physics Division tel (int): 42334 Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334 University of Liverpool http://www.liv.ac.uk/physics/hep/