On Fri, Feb 11, 2005 at 09:21:37AM -0800 or thereabouts, Rod Walker wrote:
> Only the scheduler knows which job will start next so only the scheduler
> can give the estimated response time. Maui provides
> showstart [JOB]
> http://www.clusterresources.com/products/maui/docs/commands/showstart.shtml
The reasons for not doing this were covered in the operations workshop
namely that the method must be portable and so batch system independent.
The old algorithm I always thought did the job, certainly
better than what is there now, it might not be accurate but was good
enough. All you want to stop happening is jobs piling up when there is free
resource else where.
>
> This gives the best guess at the start time for the job. The only possible
> improvement on this could come from Jeff's method but this is a 2nd order
> effect. Every other scheme of counting free cpus, queued job, total cpus
> is bound to fail when using an educated scheduler with a site policy.
>
> So the ERT is the showstart time of the last job in the queue. This
> assumes all jobs in a queue are equal(FIFO). If there are no queueing jobs
> then ERT=0.
>
> Cheers,
> Rod.
>
> On Fri, 11 Feb 2005, Steve Traylen wrote:
>
> > On Fri, Feb 11, 2005 at 07:33:35AM -0800 or thereabouts, Rod Walker wrote:
> > > Hi,
> > > How does this give anything useful at all? This is the estimated response
> > > time for pbs:
> > > $MaxTime=(($TotalJobs * $WallTime) - $UsedTime) / $TCPU;
> > > where TCPU is the smaller of total number of cpus or the max running jobs.
> >
> > The old one used to have
> >
> > if ( $queuedJobs = 0 ) {
> > ETT=0
> > }
> >
> > which was good, I see it as clear that if you are not queuing a job
> > for someone then chances are you will run a job from them immediately.
> >
> > The only rule that really matters is that ETT must go up for a queue
> > as soon as you start to queue jobs in that queue. Until a job is queued
> > ETT should not go up at all. I would say these two things should
> > be true for any calculation that use.
> >
> > See.
> > https://savannah.cern.ch/bugs/?func=detailitem&item_id=6213
> >
> > where this is being commented on.
> >
> > Steve
> >
> > >
> > > Our cluster has 18 cpus and 2 jobs in the atlas queue, with a combined
> > > used wall time of around 72 hours. The max walltime is 72 hrs. So
> > > $MaxTime=((2*72)-72)/18=4hours
> > > Give or take, this is what lcgce01.triumf.ca is publishing, but it clearly
> > > should be zero as there are 16 free cpus.
> > > I know it difficult to get a consistent estimation of ERT for both full
> > > and empty grids, but is this really the best estimate?
> > >
> > > I know the logic, if freecpus>0 then ERT=0, does not work due to site
> > > policies. How about setting ERT=0 if there are no jobs queued for that
> > > CE(queue)? This will only be wrong when the site is exactly full, which
> > > almost never happens. Otherwise do your stuff with used times.
> > >
> > > Or maybe, if no jobs queued and freecpus>0 ...
> > > Ideally there is some way to extract the estimate from Maui, as this is
> > > the only truth. MOAB has a 'showstart' command for example.
> > >
> > > Cheers,
> > > Rod.
> > >
> > >
> > >
> > > --
> > > Rod Walker +1 6042913051
> >
> > --
> > Steve Traylen
> > [log in to unmask]
> > http://www.gridpp.ac.uk/
> >
>
> --
> Rod Walker +1 6042913051
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|