On 1 Jul 2010, at 16:04, Rob Fay wrote:
> On 01/07/2010 15:37, Stuart Purdie wrote:
>> So, here's a request for it. After some initial comparisons with Glasgow, Lancaster and Bristol, there might be something lurking in NAT configurations.
>>
> I've been having a look at it today since LHCb transfers started here again.
>
> Quality does definitely drop as soon as SACK and DSACK are enabled, and a lot of invalid SACK packets start showing up on the NAT box (as logged by an iptables rule, '-p tcp -m tcp --tcp-flags ACK ACK --tcp-option 5 -m state --state INVALID -j LOG --log-prefix "INVALID: " --log-level 6 --log-tcp-options'). Changing the way those packets are handled didn't seem to make any difference. What sort of configuration issues do you suspect?
I have no idea. Hence the need for more data. Lancaster's case says there's something in there that side steps the problem.
> I did find that there was a bug in conntrack in kernel versions prior to 2.6.26:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=84ebe1cdae56707b9aa1b40ae5aa7d817ba745f5
Ah - yes, that's looks rather like a likely culprit!
> However, Lancaster are running 2.6.18-164.6.1.el5 which doesn't have the fix in and they don't have problems, so that does suggest there's another factor at work.
And that's why I was wanting the NAT configs, so that I can do a comparison. Lanacaster are using -j MASQUERADE on POSTROUTING, we're using -j SNAT; and that might be enough. Certainly, the code for the two modules in 2.6.18 (SL5.3 default kernel) is markedly different; although I've not had time to fully digest it yet.
Of course, that's not really an optimal answer: MASQUERADE will drop all connections in the event of a network blip, which is not desired. In theory at least - in paractice, it seems to be working at Lancaster.
|