Hi all,
I'm sure there's any easy solution to this, but I'm missing it;
I am trying to set up memory limits in torque/maui such that an
errant job that uses (much) too much memory gets killed off.
The problem that I'm hitting is that setting queue properties
like:
resources_max.pmem
also uses the limit given as part of the job requirements, so,
if I set it at more than 2Gb, then our 2Gb per job slot worker
nodes don't get filled all the way up. If I set it to 2Gb, some
jobs that go slightly over will wind up getting killed when in
practice they could have run quite happily.
Does anyone know how I can express logic like:
"Allocate jobs 2Gb each (or just ignore the memory
requirements when scheduling), but kill them if they
go over about 3Gb."
From reading the documentation I'm moderately confident this can
be done with maui, if not torque directly, but I have yet to
work out how.
Ewan
|