Peter
It sounds to me that you should cap the LHC VO jobs to keep back CPU's
for the pheno users. As I understand it Durham only runs production for
ATLAS and LHCb so neither of those VO's is likely to complain much. You
might also like to explain to your users that as a consequence of this
you'll probably earn less money in the accounting period and that
therefore the cluster will probably be smaller than it might have been
in the future.
I don't understand how you currently stop 'the scheduler to let them use
unused CPUs even if their "fairshare" is used up' - I wasn't aware that
it was possible to make the fairshare hard in maui.
cheers
Duncan
On 18/04/2011 17:51, Peter Grandi wrote:
> Another chapter in the eternal discussions about batch scheduling...
>
> Our site runs mostly CPU-intensive LHC VO "production" jobs and
> 'pheno' VO "MC" jobs. The LHC VO jobs are mostly part of
> automated mass flows, while the 'pheno' VO jobs are mostly from
> individual users (but essentially all 'pheno' work is strongly
> LHC related of course) and have a very bursty pattern
> (occasional runs of something like 2,000-7,000 jobs).
>
> The problem is that users tend to like their jobs to have short
> latency, and they dislike seeing unused CPUs; but if I set the
> scheduler to let them use unused CPUs even if their "fairshare"
> is used up, when LHC VO jobs come in they have to wait for the
> user jobs to end. Viceversa if I let the flood of LHC VO jobs
> take over, the latency of user jobs is affected.
>
> Obviously to have really minimal latency for everybody there
> should always be some spare capacity, but that's quite
> expensive, and I dislike that too.
>
> I have been thinking of allowing groups/users to go above their
> fairshare but only for short-duration jobs (but so many pheno'
> jobs are long duration that is may not be worthwhile), to
> minimize the latecy impact of overallocations.
>
> What kind of policy is compatible with LHC/WLCG/GridPP goals?
> Because I suspect that T2 latencies of a few/several days do not
> matter that much.
>
> What kind of fairshares are other sites running?
>
> Any sample MAUI configs to have a look at?
|