JISCMail - TB-SUPPORT Archives

> [ ... ] some jobs that usually use less than 2GB of memory,
> but occasionally and briefly expand to about 3.5GB, and we
> want to know whether sites can run them or not, [ ... ]
> Running jobs like that certainly doesn't need swap space equal
> to twice RAM, and in practice, doesn't even need swap space
> equal to RAM. Indeed, this sort of usage pattern seems to me
> to be more or less what we already see, [ ... ]

I remember looking into the current situation previously during
a review of worker node memory/swap allocation:

  https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=TB-SUPPORT;bb8e534c.1106

and my impression was as per the above summary current ATLAS
jobs really only need 2GiB/process even if they might allocate
4GiB/process, but only in 64b mode:

https://twiki.cern.ch/twiki/bin/view/Atlas/SL5Migration#Virtual_Memory
 «Virtual Memory
  It is known that code running on x86_64 machines has a larger
  VMEM footprint then when run on i386 architecture. ATLAS
  request that all jobs have access to 4GB of virtual memory on
  SL5 x86_64 resources. It is not necessary to have more than
  2GB of physical memory per job slot.»

The above is not awesomely clear (the 'resources' might include
'mmap'ed files, but the feeling I get is that most jobs (at
least production ones) are well under 2GiB anyhow.

Also VMEM includes 'mmap'ed files, which can be quite large, and
don't need swap space because they are backed by the filespace.
I just looked at the output of 'pmap' for a 1.8GiB Firefox
process and the top allocations include several mapped files:

  #  pmap 31866 | sort -k 2,2 | tail -10
  00007f3103000000  27264K r-x--  /usr/lib/firefox-11.0/libxul.so
  00007f30ca946000  32432K r----  /usr/share/icons/hicolor/icon-theme.cache
  00007f30eefda000  32432K r----  /usr/share/icons/hicolor/icon-theme.cache
  00007f30aee00000  34816K rw---    [ anon ]
  00007f30ce400000  48128K rw---    [ anon ]
  00007f30e3100000  48128K rw---    [ anon ]
  00007f30abc00000  50176K rw---    [ anon ]
  00007f30d9968000  65540K rw-s-  /dev/shm/pulse-shm-3469094682
  00007f30b1400000 105472K rw---    [ anon ]
  00007f30b7b00000 281600K rw---    [ anon ]

Now there seems a new requirement for per-process allocation
larger than 3.5GiB, and whether that implies more physical
memory or swap depends indeed a lot on how it is used/reserved.

Because paging should be avoided entirely as the Linux paging
system is stunningly bad (most likely all core Linux developers
have lots of RAM and their systems never page).

One complication under Linux is that the Linux kernel
''overallocates'' both physical and swapspace memory:

  /proc/sys/vm/overcommit_memory  /proc/sys/vm/overcommit_ratio

and that's because many programs 'malloc' more memory than they
need "just in case", or to run their own memory allocators on
top of it.