Print

Print


On 30 Jun 2010, at 12:25, Rob Fay wrote:

>> I'd be interested if applying this sort of tuning to worker nodes will have any effect at the other sites that are having transfer problems - Brunel, Liverpool, Sheffield and Bristol.
> 
> We tried it at Liverpool last Thursday, and sure enough, the quality of the transfers went into the dark green.

Excellent!

> I then restored all settings to defaults apart from SACK and DSACK being disabled, and all transfers since then have been 100%. However, there haven't been that many transfers since then, so I don't think I can really say with certainty that SACK/DSACK are the issue, but the evidence so far would appear to indicate that may be the case, at Liverpool at least.

That's Just Plain Weird!  (Assuming that the problem still disappears when there is no NAT box).

There's a known problem with SACK and linux, for LFN's - i.e. if the buffers get over 20Mb, then it takes too long for the kernel to search the buffers, and it misses the timeouts.  (See, e.g. http://fasterdata.es.net/TCP-tuning/linux.html ).  I didn't think that this would apply because SACK needs support on _both_ sides, and the target nodes will probably have it turned off (as YAIM is fond of doing).  Except, of course, I'm assuming that YAIM tunes _all_ disk pool nodes, across all the SE types.  That might not be a good assumption - we know that it tunes DPM pool nodes (and this SACK is off), but if dCache and CASTOR nodes don't get the same treatment by default, that might put SACK back in the picture as the culprit.

DSACK all about detecting spurious ack re-transmission; to try to prevent duplicate sending of data when it's not needed.  I've seen some comments about it being a bit error prone in some scenarios, although never with NAT mentioned. If I had to point a finger, I'd be looking more at DSACK than SACK.   If the algorithm is getting it wrong, one way or another, and thus slowly stopping sending retransmissions, that would explain the working for a bit and then slowing to a crawl behaviour.  I'll read up on this stuff ...

Either way, it nice to have that narrowed down.  I'd been meaning to do that myself, but ran out of time (and jobs!) to get any data on that.

It would be good if you could keep that network config till we get more LHCb production work - we've none at the moment, so I can't test things here [0], to confirm that switching off SACK and DSACK fix the problem.

(There might well be value in twiddling the buffer sizes anyway, for performance in transfers; but it is preferable to _know_ that these are separate issues)

Stuart

[0] Ok, I can.  But the only way we can show the problem manually is to hammer the endpoint - this tends to result in requests not to do that!