Hi Alessandra,
We're currently using 2x rather than 3x.
By default on *7 HTCondor has:
BASE_CGROUP=htcondor
So for memory, for example, the cgroups for jobs appear in the usual place:
/sys/fs/cgroup/memory/htcondor/...
Why are you using BASE_CGROUP=/system.slice/condor.service? When jobs are running can you see memory cgroups being successfully created?
Regards,
Andrew.
________________________________
From: Testbed Support for GridPP member institutes [[log in to unmask]] on behalf of Alessandra Forti [[log in to unmask]]
Sent: Thursday, October 12, 2017 4:30 PM
To: [log in to unmask]
Subject: arc, htcondor, cgroups limit setup
Hi,
our ARC/HTcondor setup is according to the recommendations on the gridpp wiki. I particular we have the RAL receipe [1] with slightly more restrictive values 2* rather than 3* (if Andrew L hasn't changed since then)
RemoveMemoryUsage = ( ResidentSetSize_RAW > 2000*RequestMemory )
we also have cgroups enabled
# Enable CGROUP
BASE_CGROUP = /system.slice/condor.service
CGROUP_MEMORY_LIMIT = soft
however today a user managed to run jobs that were using 13-20 times the memory requested and the system didn't do anything.
Am I doing something wrong? Should I put also specific limits in cgroups? At the moment I have no memory limit set for htcondor
systemctl show htcondor |grep -i mem
MemoryCurrent=18446744073709551615
MemoryAccounting=no
MemoryLimit=18446744073709551615
LimitMEMLOCK=65536
Also has anyone tried the cgroup accounting? That might be interesting.
thanks
cheers
alessandra
[1] https://www.gridpp.ac.uk/wiki/Enable_Cgroups_in_HTCondor#RAL_Modifications
--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... covfefe!
|