PS: Rob Fay, who is off today, knows a lot more than I do about this
issue. I'll talk to him when he's back.
Steve
On 05/31/2012 10:31 AM, Stephen Jones wrote:
> Matt,
>
> this is just a hunch. When you have "runnable" single jobs, are there
> unrunnable whole-node-jobs in the queue in front of them?
>
> Reason for asking: Maui pops from the job queue only until it hits the
> first unrunnable (for whatever reason) job . So it never looks deeper
> into the queue beyond the first unrunnable job - there may be
> runnable jobs in the queue but maui would never reach them. Instead,
> it applies some tetris-style "backfilling" algorithm (which is is
> broken).
>
> Anyway, the problem "may" be down to this scenario (it's an idea,
> anyway). Say the queue is sorted as follows: W1,W2,W3,S1,S2,S3 (i.e.
> three whole-node-jobs, three-single-jobs) and let us say his cluster
> has two whole-worker-nodes (WN1,WN2) and two single-worker-nodes
> (SN1,SN2). On a scheduler cycle, W1 and W2 would be dispatched to WN1
> and WN2, leaving the queue as W3,S1,S2,S3. Maui cannot schedule the
> next job (W3) as no node can take it. Maui does not look deeper into
> the job queue as stated above. So, even though S1, S2 and S3 "could"
> be scheduled, they are not scheduled. Instead, some broken
> "backfilling" algorithm is invoked, that is supposed to "gap fill" the
> other jobs. Like I said, it's broken in some way, so I am reliably
> told - I don't know how, but it leaves queued jobs just sitting there
> even when slots exist to run them.
>
> Summary: you'll only get the single-jobs to run when there are no
> unrunnable whole-node-jobs in front of them. To test, kill the
> unrunnable whole-node-jobs in front of the queued single-jobs - you
> will then see the single-jobs start.
>
> Please let me know if this is the issue. I don't know any fix, yet.
> I've been looking for a good excuse to fix this issue, by rolling our
> own maui.
>
> Steve
>
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|