On 4 Oct 2012, at 10:04, Andy Swiffin <[log in to unmask]> wrote:

> Hi
> I'm currently putting together plans to update our IdP infrastructure and want to add in automatic failover.   We currently have a manual failover to a secondary IdP, should it ever be needed (which since the move to Shib 2 many many moons ago it hasn't!).   However I would like to push Shib authentication for some high profile services and without having a demonstrable auto failover mechanism this won't be well received.
> Because Shibboleth is stateful, if you are going to loadbalance cluster it you need a mechanism for sharing state information, the shibboleth documentation says:  "By default the Shibboleth team recommends the use of Terracotta as the mechanism for doing this"  which is a shame because I have it on high authority that "I think you'd be insane to consider it."....    I know a lot of people have found Terracotta to have, itself, caused shib outages.
> Without state sharing you need to go for a hot standby rather than loadbalanced approach, but unfortunately our existing Cisco content switch (which is well overdue for replacement) cannot do this.    
> So,  I'd be interested to hear from anyone who is doing hot standby with their Shibboleth IdP  (i.e. if IdP1 is responding always use it, if it fails the test switch to IdP2)  and what type of hardware loadbalancer you're using at the front to do this.


Missed this at the time but we do exactly that :)

Two x Shib IDP as a "primary" and "failover" node which know nothing about each other.

We use LVS (Linux Virtual Server) Load Balancing in front which does a frequent but very simple service check to the primary and if it doesn't reply in a reasonable time will send traffic to the secondary node instead (normally within a few seconds). This currently doesn't cope with a tomcat failure, only an Apache one.

Semi-seamless failover is achieved by using SimpleSAMLphp+mod_authmemcookie on both nodes to protect Apache+mod_ajk and sharing sessions between them.

The scope for a user-impacting problem is small, IMO: someone hitting PRIMARY for the initial Shib assertion generation, getting bounced back to the SP and the SP then needing to do a back-channel Attribute Lookup and being told that the token is invalid. Given that an increasing number of SPs are supporting SAML2 and POSTing of assertions via the browser this will become less and less of a problem (I hope!).

Yell if you need more details.
Matthew Slowe
Server Infrastructure Team      e: [log in to unmask]
IS, University of Kent          t: +44 (0)1227 824265
Canterbury, UK                  w: