Hi,
so far i've used separate more private threads but this is becoming
confusing so I'll start a single one on TB-SUPPORT. Apologies if a speak
a bit of atlas-ese.
So far I've enabled the new rss(+swap) scheme [1] on a multicore queue
at RAL, Glasgow and Brunel where the combination ARC/HTcondor made it
straightforward because it is basically a renaming of an AGIS parameter
maxmemory -> maxrss even though I have the impression each site has a
slightly different way to tackle jobs that exceed the memory and as
discussed at an ops meeting we should document that in the wiki.
I worked on the CREAM/torque side too and Manchester has rss+swap
enabled on all the queues and the parameters are passed to one cluster.
I attached the script for torque. Some observations
* I left the _Min entries in case other users use them (biomed every now
end then sends jobs with parameters set) but atlas doesn't.
* The parameters can be set in the AGIS PandaQueues:
maxrss,maxswap,maxtime if anyone of the 3 is not defined the old scheme
will be used.
* We have used Glue1 at the end, no point in changing that
* cputime and walltime in Glue1 are in minutes and need to be converted
back to seconds that's why there is a factor 60 there
* cputime=ncores*walltime if you want to use it, if you want to keep a
48h (or whatever you have) you need to set maxtime=6h on the multicore
queue.
* GlueHostMainMemoryRAMSize is assigned to mem but torque/maui cannot
kill anymore on mem as RLIMIT_RSS isn't used by the kernel anymore. If
you want to use this to limit the jobs you need to use vmem.
* GlueHostMainMemoryVirtualSize=maxrss+maxswap is assigned to vmem but
vmem for a process is not rss+swap anymore without cgroups and it's the
address space which is slightly larger than the nominal 4GB atlas asks.
overcommitting works only for the mem parameter not for vmem. Said that
is you have a large unused swap on you nodes you can get away with
increasing maxswap. So far only single core jobs needed this treatment.
So on my queues I now have
* single core short analysis: maxrss=2GB, maxswap=2GB, maxtime=4h
* single core analysis and production: maxrss=2GB, maxswap=3GB, maxtime=48h
* multicore production: maxrss=16GB, maxswap=16GB, maxtime=6h
I don't know how many sites using torque/maui would benefit from this
work in the UK since we are trying to eliminate it but if you want the
jobs to pass the parameters now you can do that. Also this was done with
an eye to the SoGE sites which may not move to another batch system but
may have similar memory problems.
Possible other steps
* Would RAL like to go ahead with other queues?
* Would any other ARC-CE/Htcondor site try?
* Would any site which still have cream/torque want to try this script?
* Would any SGE site want to adapt it to their site (I asked Matt but he
has UGE with the possibility to enable cgroups so Sussex is in a
situation more akin to ARC/Htcondor sites)?
cheers
alessandra
[1]
https://drive.google.com/file/d/0B_tp6usAhDinWDFzU1F1dXk0b0U/view?usp=sharing
--
Respect is a rational process. \\//
|