There's a number of different types of memory that we can discuss.
There is:
Physical memory used
Physical memory available
Virtual memory used
Virtual memory available
Address space used
Address space available
Swap space used
Swap space available.
_All_ of these numbers are different. Some of them are functions of the node, and some of them are per process values. To ask about certain parts of these, without understanding how they relate to each other, is going to end up with numbers that don't make sense.
The term 'VMem', _as measured by top_ is the 'Address space used', where 'used' means 'mapped', as in mmap / malloc sense.
Note that 'Virtual Memory' != 'Swap space', as the kernel has more facilities for juggling memory than just swap space. In particular, 'Virtual Memory' > 'Swap space', for all practical workloads.
It is useful to have the concept of a 'working set' of memory - how much the job has to keep in memory at one point in time. Note that it is very common for a job to have a working set smaller than the total mapped Address Space.
--
It sounds like these Atlas Reco jobs have a peak footprint of 3.5 ish GB. The _important_ question is if sites will kill jobs like that. (Glasgow won't).
The next important question is if those jobs will kill everything on the box. We, as site admins, consider this an important point.
If Atlas _really_ expect to drive worker nodes into heavy swapping, then that's going to kill _everything_ on the worker node. Once swapping starts, everything gets a lot slower. This means that the walltime limits of jobs will be hit long before the job is near complete.
If Atlas expect these reco jobs to spend a minute or so with a working set of 3GB, then this is extremely unlikely to cause problems, and probably wont swap. Even though the job is going to be useing more the usual 2GB per core.
If you _need_ us to have so much swap, as is being suggested, then this is entirely the wrong approach, and _will not work_.
--
The whole process reads very much as if someone has assumed that 'VMem' = 'Physical RAM used + Swap space used' - which is false.
This is not just a technical point (although it is frustrating to get asked questions that clearly demonstrate the asked don't understand what they are asking for) - it is that if we _need_ that much swap, then without special handling of those jobs they will kill everything on the worker node. We don't want that, hence having to drive into the midst of the issue in order to find out what is actually going to happen.
|