Wed, Mar 14, 2012 at 11:20:25AM +0100, Ronald Starink wrote:
> - In Torque, we defined per queue a limit for the maximum physical memory
> being used (pmem) by the job and a per-process limit on the virtual memory
> (pvmem):
>
> set queue <QUEUE> resources_max.pmem = 3000mb
> set queue <QUEUE> resources_max.pvmem = 3800mb
>
> The nice thing about the pvmem limitation is that it limits the virtual
> memory available to each process: ulimit -v returns 3891200 (/ 1024 = 3800).
> Consequently, individual processes cannot allocate more memory and get the
> opportunity to deal with allocation failures. The batch system does not
> actually kill the jobs.
[...]
> These changes do not protect against jobs that happily spawn tons of
> memory-hungry child processes.
'set queue <QUE> resource_limits.vmem = <amount>' will protect against
this; and violating jobs will be killed. That's what we use at our
cluster.
--
Eygene Ryabinkin, National Research Centre "Kurchatov Institute"
Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.
|