Hi
Please could all discussions on this topic be taken off the rollout list
and be conducted in Savannah where it belongs.
https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=6213
The ETT needs to be consistence for all information providers. If there
is an error in a specific dynamic plug-in, please open an new bug. If a
general change needs made to all dynamic plug-ins, this needs to be
documented in bug number 6213 and new bug should opened to specifically
implement this across all dynamic plug-ins.
Thanks
Laurence
Rod Walker wrote:
> Hi,
> Bells and whistles are all very well but can I bring this discussion
> back to the original point. My CE has a queue for the atlas VO, no
> tricky policies and free cpus but is publishing a large ERT.
>
> $MaxTime=(($TotalJobs * $WallTime) - $UsedTime) / $TCPU;
> is rubbish. What should it be replaced by and how?
> How about an optional update to /opt/lcg/libexec/lcg-info-dynamic-pbs
> if ($QueuedJobs == 0){
> $MaxTime=0
> }
> or && $FreeCPU >0 to be extra safe.
>
> Sites which don`t update may get fewer jobs.
>
> Cheers,
> Rod.
>
> Jeff Templon wrote:
>
> >Hi,
> >
> >On Tue, 2005-02-15 at 18:43, Burke, S (Stephen) wrote:
> >
> >
> >
> >>The disclaimers on that page suggest that in general it won't give a
> >>useful answer. Also it says that it calculates the earliest possible
> >>start time, which isn't really what you want. I regularly argue that
> you
> >>should err on the side of pessimism - better for a site to repel
> jobs it
> >>could run than attract jobs it can't.
> >>
> >>
> >
> >The new method takes a compromise -- it's neither optimistic nor
> >pessimistic ;-) It simply looks at the current state, and predicts that
> >your job will fare similarly to what's observed now.
> >
> >I guess there is one optimistic point: if you ask for an ERT for a VO
> >for which no jobs are currently waiting in the queue, and there are free
> >CPUs, it "sees no reason to expect" that your job won't run immediately
> >;-)
> >
> >
> >
> >> Another point is that the values published across sites need to be
> >>reasonably compatible, so methods for different batch systems which
> give
> >>substantially different answers aren't a good idea. A simple example of
> >>that is that early in EDG the BQS info provider at Lyon always
> published
> >>an ETT of at least 120 seconds because it took that long to start any
> >>job, whereas PBS published zero (and still does) for empty queues. The
> >>
> >>
> >
> >This will have to shake out as we gain experience. If we ever reach the
> >high-priority use case (run my job NOW) then we will need to report
> >rather accurately. Someone who wants their job started within 15
> >seconds (yes, I know, better not use the RB) will not be happy landing
> >at NIKHEF, where the scheduler cycle time is two minutes. It seems like
> >fast-response and large capacity are orthogonal; the more jobs and WNs
> >you have, the more time it takes to run a schedule cycle. At least for
> >a nontrivial scheduler.
> >
> > JT
> >
> >
> >
> >
>
> --
> Rod Walker +1 6042913051
>
|