On Wed, Oct 16, 2013 at 10:11:53AM +0100, Ian Young wrote:
>
> On 16 Oct 2013, at 09:28, Matthew Slowe <[log in to unmask]> wrote:
>
> > All three on Sun^WOracle JVM 1.7:
> >
> > $ java -version
> > java version "1.7.0_25"
>
> They're up to _40 or something now, but I can't see that mattering.
This is whatever comes with RHEL, but noted.
> > Loads of suggestions in the intertubes that (counterintuatively) lowering the
> > heap size is a good thing to do. I don't fully understand how much ShibIDP
> > caches stuff (if at all) but might we benefit from caching less?
>
> Disclaimer: I've never run a serious production IdP, so take this all with a pinch of salt.
>
> The main thing the IdP keeps in its head is the metadata, which I seem
> to recall ends up being surprisingly large (100MB to 200MB is what my
> memory tells me). It doesn't really have much else it can cache other
> than database connections and the like. Things like replay caches
> don't take up much room at all.
These IDPs aren't on the UKAMF (they just talk to Office365) so there's
no bloaty metadata to cache.
> So yes, turning the heap size down means you'll see more frequent but
> hopefully smaller GCs.
>
> > 2013-10-15T11:19:19.190+0100: 9832.809: [GC [PSYoungGen: 326540K->22073K(327360K)] 753409K->452843K(1026432K), 15.4796500 secs] [Times: user=0.15 sys=0.03, real=15.47 secs]
>
> I find it interesting that the elapsed time is 100x the user process
> time; GCs are user mode so if something is taking 15s and pausing
> everything meanwhile, I'd suggest looking at whether you're getting a
> lot of paging during the GC.
Yes, that's spawned a thought -- we do see swapping and I'd not made the
connection with the silly "real" times before... probably because when
the JVM pauses, Apache (front-ending Tomcat) will start having to spin
up new children to cope with the lack of communication with Tomcat and
therefore eat memory. I've dropped the MaxChildren down to a "can't
exceed memory" amount on one of the nodes to see if that helps.
> If a lot of your heap isn't being referenced at all most of the time,
> either your guest OS or the hypervisor or the host OS (if there is
> one, depending on your VM environment) might be helpfully removing it
> from RAM "just in case" someone else might need it. Of course, when a
> GC comes along, the one thing you can guarantee it will do is page all
> of that back in again, random access, one page at a time.
>
> Turning the heap size down might also fix a problem like that, because
> you'll be touching everything often enough that whatever is deciding
> the memory is unused won't get that impression after all.
>
> The other thing to look at would be whether your VM is set up to keep
> everything in memory or whether the hypervisor is allowed to move some
> of it out. If the latter (which isn't unreasonable if your VMs are 4GB
> but your Java heap is 1GB) then it might be worth (again, perhaps
> counterintuitively) trying reducing the guest VM size as well.
>
> As I said at the top, these are just some ideas from someone who
> hasn't done this in production, but *has* suffered from the wacky
> memory behaviour of some of the VMware products at smaller scales.
> VMware Server on Linux, in particular, seems to end up swapping large
> chunks of memory out even if there's plenty of room for everything. I
> imagine you're using something significantly more modern, though.
Our VMWare setup doesn't swap stuff out (we don't overcommit on memory
on our host servers) but that's given me plenty of food for thought. I
think some testing with smaller heaps might be in order, too...
Thankyou!
--
Matthew Slowe
Server Infrastructure Team e: [log in to unmask]
IS, University of Kent t: +44 (0)1227 824265
Canterbury, UK w: www.kent.ac.uk
|