El 16/06/17 a las 15:39, Matthew Vernon escribió: > Hi, > > I've been bugging Jisc about this bug for ages, so thought I'd try > looking at it myself. I didn't make much headway, but perhaps enough to > let someone who knows what they're doing fix it :) > > The failure mode is that tids processes do not die, and instead sit > around chewing 100% CPU - over time you have enough of these to bring > your IdP to its knees. So, does this only happen when you are shutting the tids process down? (e.g. Ctrl-C?). > We've bodged round this by having a cron job do > system tids restart ever 2 hours :( > > strace on a spinning tids produces no output (suggesting no system calls > are being made), gdb (with moonshot-trust-router-dbg installed) always > looks roughly like: > > Attaching to program: /usr/bin/tids, process 2374 > [New LWP 2375] > [New LWP 2376] > [New LWP 2377] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > 0x00007fac5b1804c2 in log4shib::Category::getChainedPriority() const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > (gdb) bt > #0 0x00007fac5b1804c2 in log4shib::Category::getChainedPriority() const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #1 0x00007fac5b17fc79 in log4shib::Category::isPriorityEnabled(int) > const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #2 0x00007fac5b1809fc in log4shib::Category::info(char const*, ...) () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #3 0x00007fac5d9a63c2 in shibsp::SPConfig::term() () > from /usr/lib/x86_64-linux-gnu/libshibsp.so.6 > #4 0x00007fac5d9a6a88 in shibsp::SPInternalConfig::term() () > from /usr/lib/x86_64-linux-gnu/libshibsp.so.6 > #5 0x00007fac5dd6fd7e in shibresolver::ShibbolethResolver::term() () > from /usr/lib/x86_64-linux-gnu/libshibresolver.so.1 > #6 0x00007fac5e1c9189 in gssEapLocalAttrProviderFinalize () > from /usr/lib/x86_64-linux-gnu/gss/mech_eap.so > #7 0x00007fac5e1c1174 in ?? () from > /usr/lib/x86_64-linux-gnu/gss/mech_eap.so > #8 0x00007fac5f5ebff8 in __run_exit_handlers (status=status@entry=0, > listp=0x7fac5f9755f8 <__exit_funcs>, > run_list_atexit=run_list_atexit@entry=true) at exit.c:82 > #9 0x00007fac5f5ec045 in __GI_exit (status=status@entry=0) at exit.c:104 > #10 0x00000000004059b7 in tids_accept (tids=0x190e200, listen=<optimized > out>) > at tid/tids.c:485 > #11 0x0000000000405dec in tids_start (tids=tids@entry=0x190e200, > req_handler=req_handler@entry=0x403dc0 <tids_req_handler>, > auth_handler=auth_handler@entry=0x403d70 <auth_handler>, > hostname=<optimized out>, port=port@entry=12309, > cookie=cookie@entry=0x1904dc0) at tid/tids.c:546 > #12 0x0000000000403a94 in main (argc=<optimized out>, argv=<optimized out>) > at tid/example/tids_main.c:389 > > tid/tids.c:485 is > exit(0); /* exit to kill forked child process */ > > ...so it appears to be a bug in something's exit handlers? > getChainedPriority does have a loop in it: > > const Category* c = this; > while(c->getPriority() >= Priority::NOTSET) { > c = c->getParent(); > } > > ...which makes me wonder if something is being incorrectly initialised, > but I'm rather clutching at straws here. > > Debian/Ubuntu don't ship a log4shib library with debugging symbols > installed. > > I then installed moonshot-gss-eap-dbg and the problem seems much slower > to recur (to the point that I'd thought it had caused the problem to > entirely go away); now a bt looks like: > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > 0x00007fd74ece84c2 in log4shib::Category::getChainedPriority() const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > (gdb) bt > #0 0x00007fd74ece84c2 in log4shib::Category::getChainedPriority() const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #1 0x00007fd74ece7c79 in log4shib::Category::isPriorityEnabled(int) > const () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #2 0x00007fd74ece89fc in log4shib::Category::info(char const*, ...) () > from /usr/lib/x86_64-linux-gnu/liblog4shib.so.1 > #3 0x00007fd75150e3c2 in shibsp::SPConfig::term() () > from /usr/lib/x86_64-linux-gnu/libshibsp.so.6 > #4 0x00007fd75150ea88 in shibsp::SPInternalConfig::term() () > from /usr/lib/x86_64-linux-gnu/libshibsp.so.6 > #5 0x00007fd7518d7d7e in shibresolver::ShibbolethResolver::term() () > from /usr/lib/x86_64-linux-gnu/libshibresolver.so.1 > #6 0x00007fd751d310f7 in gss_eap_shib_attr_provider::finalize () > at util_shib.cpp:481 > #7 0x00007fd751d31189 in gssEapLocalAttrProviderFinalize ( > minor=minor@entry=0x7ffeac0d31f4) at util_shib.cpp:550 > #8 0x00007fd751d29174 in (anonymous > namespace)::finalize_class::~finalize_class (this=<optimized out>, > __in_chrg=<optimized out>) at util_attr.cpp:101 > #9 0x00007fd753153ff8 in __run_exit_handlers (status=status@entry=0, > listp=0x7fd7534dd5f8 <__exit_funcs>, > run_list_atexit=run_list_atexit@entry=true) at exit.c:82 > #10 0x00007fd753154045 in __GI_exit (status=status@entry=0) at exit.c:104 > #11 0x00000000004059b7 in tids_accept (tids=0x968200, listen=<optimized > out>) > at tid/tids.c:485 > #12 0x0000000000405dec in tids_start (tids=tids@entry=0x968200, > req_handler=req_handler@entry=0x403dc0 <tids_req_handler>, > auth_handler=auth_handler@entry=0x403d70 <auth_handler>, > hostname=<optimized out>, port=port@entry=12309, > cookie=cookie@entry=0x95edc0) at tid/tids.c:546 > #13 0x0000000000403a94 in main (argc=<optimized out>, argv=<optimized out>) > at tid/example/tids_main.c:389 > > This is an Ubuntu Xenial system, but I've seen the runaway-tids problem > basically since I started looking at the moonshot pilot back at my > previous job (where we were running Debian). Which installation instructions have you followed? I failed to find official ones for Ubuntu Xenial. Not saying that is the problem, but wondering whether incorrect libraries might be related. Regards, Alejandro > > Regards, > > Matthew > >