Note that this is to be expected, but will drive the "new" estimated
response time scheme batty ...
Why?
The old ERT was sufficient in the early days when all queues were FIFO.
It was wrong but essentially binary ... either "zero" or "very big"
which was more or less enough.
The old ERT became insufficient when people started using Maui etc to
assign different priorities to different VOs, so that a single number
was no longer sufficient. One could have jobs waiting in the queue for
CMS while ATLAS had a number of free slots.
The new ERT reports a single number for each VO. This is OK, until
people take the game one step further and start to arrange different
priorities *within* a VO. So NIKHEF might report 3420 seconds ERT for
ATLAS, and this might be relevant for Jeff "just pretending to be ATLAS"
Templon but not for Rodney "ATLAS GridZilla" Walker because his jobs as
production dude will waltz straight to the front of the queue.
This was Cal Loomis' scare scenario two or three years ago. His
solution was to provide a web service to which one could present a JDL
and a user DN, and in return would get an ERT number.
I get the impression we're still a bit far off from being able to do
things this way. It will be useful to keep this sort of consideration
in mind while designing the temporary (cough cough) solution (cough)
that we'll need until we *do* get that far or until some enterprising
grid person comes up with an even more brilliant idea.
J "back to VO boxes" T
Steve Traylen wrote:
> On Thu, Oct 06, 2005 at 09:39:17PM -0700 or thereabouts, Rod Walker wrote:
>
>>Hi,
>>In preparation for upcoming production I`m exercising the system at full
>>scale. This time around there is a lot more unofficial production
>>activity, and I`m wondering how Atlas can enforce the intra-VO fair-share.
>>For example, we might want 90% of the Atlas resources to be for official
>>production. In this case that means the user mapped from
>>/C=CA/O=Grid/OU=westgrid.ca/CN=Rodney Walker
>>should get 90% of the Atlas share.
>>
>>Currently there is no way to express Atlas policy and no way to enforce
>>it, and I`m looking for a short term solution. My feeling is that this can
>>only be enforced at the site level, e.g. in the Atlas group part of
>>the Maui config file.
>>The long promised voms will be able to say this is a production user, but
>>what next? The 'what next' will be the same whether we have voms or you
>>just believe me that Rodney Walker is the only production user(for this
>>test).
>>
>>So I`m thinking looking up the user mappings and adjusting the fair-share
>>of atlasXXX - periodically and perhaps looking up atlas policy on some web
>>page. The mechanism is not so tough, but I think we`ll need it for
>>upcoming productions - it`s probably sufficient if a few big sites do
>>something so I`m not proposing on-the-fly middleware development for all.
>>Ideas?
>>
>>Right now I`ve 1200 running with 1800 queued, where the number queued per
>>site is vaguely proportional to the number of cpus. So if you`re site has
>>Atlas 10.0.1, LCG2.6.0, ram>=600MB and no jobs then let me know. FZK has only
>>1 running job and 80 queued which looks fishy.
>
>
> If you have a web page along the lines of.
>
> [priorities]
> production=40
> analysis=30
> default=20
>
> [production]
> /C=CA/O=Grid/OU=westgrid.ca/CN=Rodney Walker
>
> [analysis]
> /O=Grid/O=NorduGrid/OU=uio.no/CN=Aleksandr Konstantinov
> /O=dutchgrid/O=users/O=nikhef/CN=Gustavo Ordonez
>
> then I expect we can do something in a maui world anyway.
>
> You can set a user priority on the fly with out changing maui.cfg
> or restarting the service.
>
> # changeparam USERCFG[atlas001] FSTARGET=40
>
> A few things to think about in the implementation include:
>
> + The config file would have to be an official one from atlas.
> + They would be relative priorities. i.e. it makes a huge
> difference to [production] if the [default] group contains 1
> or 100 active users. Each would get 20 to your 40.
> + With multiple VO priority files they would have normalised some
> how to stop atlas setting priority to 10000000 and 20000000.
> + Sites need to be sure that they weight their group/user priorities
> so that group priority is the dominant one in order to provide
> their overall VO allocations.
> + You will still have problems if the atlas queue is completly full
> of low priority jobs compared to yours since your jobs won't come
> here at all.
> + At a later date with VOMS group the logic could be the same.
>
> + I have no idea what is feasible on LSF and it at least should be
> considered.
>
> This would obviously run as a cron to change the priorties of individual
> users at some suitable interval.
>
> Comments.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>>Cheers,
>>Rod.
>>
>>--
>>Rod Walker +1 6042913051
>
>
|