Hi Olivier
> If we discuss here about cputime/elapsed time then the rb data cannot
be
> used because the rb does not report the cputime. We can extract the
> failure rate and the waisted elapsed time but not the job efficiency
as
> defined above.
Yes - I had in mind two efficiency figures. The one from the RB would be
based on the successful vs unsuccessful CPU time (as defined in your RB
data) while Matt's analysis effectively looks at useful vs idle job
time. Matt's script works with the APEL data so we could provide
something centrally but I would like a few sites to check it first. The
Tier-1 implementation uses data directly from a PBS database (which is
checked against APEL and mostly found to agree).
It is the RB figure that will let us pinpoint those VOs that keep firing
off jobs into inappropriate queues wasting CPU time.
Jeremy
>
> Why couldn't we have this type of view (cputime/elapsed time) per vo,
> per site on the accounting page directly. I think it is a matter of
> presentation since the data is there in the accounting database.
>
> Cheers, Olivier.
>
> This will certainly be
> > required when there is contention over resources and jobs take
longer
> > queuing. We also need to start looking at job efficiencies at sites.
> > Matt Hodges of the T1 put something together for this and we were
> > planning to package that code for all sites to use - I'll check
> > progress. The results for the T1 are here:
> > http://www.gridpp.rl.ac.uk/stats/eff/RAL/All/archive/summary.html.
> >
> > Jeremy
> >
> >
> >> -----Original Message-----
> >> From: Testbed Support for GridPP member institutes [mailto:TB-
> >> [log in to unmask]] On Behalf Of Duncan Rand
> >> Sent: 10 October 2006 09:26
> >> To: [log in to unmask]
> >> Subject: Re: pushing jobs towards longer queues
> >>
> >> Is a job with no time requirements a long job or a short job?
> >>
> >> Also in my experience users don't pay much attention to jobs which
> > fail
> >> - they just get automatically resubmitted elsewhere until they
> >> eventually run to completion, often after having travelled around
> >> several sites. There seems to be little incentive to improve the
job
> >> success rates for users, VO's or sites.
> >>
> >> Duncan
> >>
> >> On Tue, 2006-10-10 at 09:16 +0100, Burke, S (Stephen) wrote:
> >>> Testbed Support for GridPP member institutes
> >>>> [mailto:[log in to unmask]] On Behalf Of Duncan Rand said:
> >>>> We have short, long and infinite queues. Jobs without any wall
> > clock
> >>>> time or CPU time requirements are going into the short queue
> >>>> by default.
> >>>> Does anyone know how I can push them into the infinite queue?
> >>> Why do you want to push them into the infinite queue? Jobs which
are
> >>> really short should go in the short queue, and as long as you kill
> > jobs
> >>> which exceed the time limit users will probably get the message
that
> >>> they should put a time requirement on longer jobs!
> >>>
> >>> Stephen
> >> --
> >> Duncan Rand, School of Engineering and Design,
> >> Brunel University, Uxbridge, UK
> >> Email: [log in to unmask] Tel. +44 1895 266804
>
>
> --
> - O. van der Aa - Imperial College London -
> - LT2 Technical Coordinator -
> - tel: +442075947810, +442071005426 -
> - SIP: [log in to unmask] -
> - fax: +442078238830 -
> - http://surl.se/agtu -
|