On 30 June 2010 13:06, Stuart Purdie <[log in to unmask]> wrote:
>
> On 30 Jun 2010, at 12:25, Rob Fay wrote:
>
>>> I'd be interested if applying this sort of tuning to worker nodes will have any effect at the other sites that are having transfer problems - Brunel, Liverpool, Sheffield and Bristol.
>>
>> We tried it at Liverpool last Thursday, and sure enough, the quality of the transfers went into the dark green.
>
> Excellent!
>
>> I then restored all settings to defaults apart from SACK and DSACK being disabled, and all transfers since then have been 100%. However, there haven't been that many transfers since then, so I don't think I can really say with certainty that SACK/DSACK are the issue, but the evidence so far would appear to indicate that may be the case, at Liverpool at least.
>
> That's Just Plain Weird! (Assuming that the problem still disappears when there is no NAT box).
>
> There's a known problem with SACK and linux, for LFN's - i.e. if the buffers get over 20Mb, then it takes too long for the kernel to search the buffers, and it misses the timeouts. (See, e.g. http://fasterdata.es.net/TCP-tuning/linux.html ). I didn't think that this would apply because SACK needs support on _both_ sides, and the target nodes will probably have it turned off (as YAIM is fond of doing). Except, of course, I'm assuming that YAIM tunes _all_ disk pool nodes, across all the SE types. That might not be a good assumption - we know that it tunes DPM pool nodes (and this SACK is off), but if dCache and CASTOR nodes don't get the same treatment by default, that might put SACK back in the picture as the culprit.
Indeed, to add some comment on this: a quick grep of the yaim
functions called for dcache configuration shows that it doesn't write
anything about SACK or DSACK when called. So, unless dcache
configuration scripts outside of yaim do so, SACK and DSACK will be at
their default values for dcache sites.
So, unless anyone with a dcache site wants to contradict me, I'd say
that it is likely that this is the missing difference between DPM and
all the other storage endpoints.
Sam
|