Hi,
On Tue, 2005-02-15 at 18:43, Burke, S (Stephen) wrote:
> The disclaimers on that page suggest that in general it won't give a
> useful answer. Also it says that it calculates the earliest possible
> start time, which isn't really what you want. I regularly argue that you
> should err on the side of pessimism - better for a site to repel jobs it
> could run than attract jobs it can't.
The new method takes a compromise -- it's neither optimistic nor
pessimistic ;-) It simply looks at the current state, and predicts that
your job will fare similarly to what's observed now.
I guess there is one optimistic point: if you ask for an ERT for a VO
for which no jobs are currently waiting in the queue, and there are free
CPUs, it "sees no reason to expect" that your job won't run immediately
;-)
> Another point is that the values published across sites need to be
> reasonably compatible, so methods for different batch systems which give
> substantially different answers aren't a good idea. A simple example of
> that is that early in EDG the BQS info provider at Lyon always published
> an ETT of at least 120 seconds because it took that long to start any
> job, whereas PBS published zero (and still does) for empty queues. The
This will have to shake out as we gain experience. If we ever reach the
high-priority use case (run my job NOW) then we will need to report
rather accurately. Someone who wants their job started within 15
seconds (yes, I know, better not use the RB) will not be happy landing
at NIKHEF, where the scheduler cycle time is two minutes. It seems like
fast-response and large capacity are orthogonal; the more jobs and WNs
you have, the more time it takes to run a schedule cycle. At least for
a nontrivial scheduler.
JT
|