Print

Print


On 1 Jul 2010, at 16:04, Rob Fay wrote:

> On 01/07/2010 15:37, Stuart Purdie wrote:
>> So, here's a request for it.  After some initial comparisons with Glasgow, Lancaster and Bristol, there might be something lurking in NAT configurations.
>> 
> I've been having a look at it today since LHCb transfers started here again.
> 
> Quality does definitely drop as soon as SACK and DSACK are enabled, and a lot of invalid SACK packets start showing up on the NAT box (as logged by an iptables rule, '-p tcp -m tcp --tcp-flags ACK ACK --tcp-option 5 -m state --state INVALID -j LOG --log-prefix "INVALID: " --log-level 6 --log-tcp-options'). Changing the way those packets are handled didn't seem to make any difference. What sort of configuration issues do you suspect?

I have no idea.  Hence the need for more data.  Lancaster's case says there's something in there that side steps the problem.

> I did find that there was a bug in conntrack in kernel versions prior to 2.6.26:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=84ebe1cdae56707b9aa1b40ae5aa7d817ba745f5

Ah - yes, that's looks rather like a likely culprit!

> However, Lancaster are running 2.6.18-164.6.1.el5 which doesn't have the fix in and they don't have problems, so that does suggest there's another factor at work.

And that's why I was wanting the NAT configs, so that I can do a comparison.  Lanacaster are using -j MASQUERADE on POSTROUTING, we're using -j SNAT; and that might be enough.  Certainly, the code for the two modules in 2.6.18 (SL5.3 default kernel) is markedly different; although I've not had time to fully digest it yet.

Of course, that's not really an optimal answer: MASQUERADE will drop all connections in the event of a network blip, which is not desired.  In theory at least - in paractice, it seems to be working at Lancaster.