Print

Print


On 23 June 2010 12:42, Christopher J.Walker <[log in to unmask]> wrote:

> Stuart Purdie wrote:
>
>> To begin, the oblibitary pretty picture:
>>
>>
>> http://lhcbweb.pic.es/DIRAC/LHCb-Production/visitor/systems/accountingPlots/dataOperation#ds9:_plotNames7:Qualitys9:_groupings6:Sources13:_timeSelectors2:-1s10:_startTimes10:2010-05-22s8:_endTimes10:2010-06-23s14:_OperationTypes14:putAndRegisters7:_Sources213:LCG.Barcelona.es,LCG.Bristol-HPC.uk,LCG.Bristol.uk,LCG.Brunel.uk,LCG.CNAF.it,LCG.Glasgow.uk,LCG.Liverpool.uk,LCG.PIC.es,LCG.Sheffield.uk,LCG.UKI-LT2-Brunel.uk,LCG.UKI-SCOTGRID-GLASGOW.uk,LCG.UNINA.it,LCG.UNIZAR.ess9:_typeNames13:DataOperatione
>>
>> (Hrm, that's a monster url: same thing at
>> http://tinyurl.com/lhcbtransjune   That is on fixed dates, not a rolling
>> 'last month')
>>
>> What you're looking at is the transfer attempts + failures for LHCb
>> traffic across a number of sites, for about the past month.  Note that this
>> is transfers, not jobs - a job can succeed after a couple of failed transfer
>> attempts, so this is the most strict criterion to look at.
>>
>> I've included all the sites that I can see were having problems, along
>> with PIC and CNAF to show the 'bad days' for comparison.
>>
>> The key thing to look is for Glasgow, after the 16th, when we switch from
>> yellowish green (about 50%) to dark green (100% near enough).  What changed
>> was I tuned the TCP stack on the worker nodes.  (Same thing YAIM does to DPM
>> pool nodes).  That resolved the problem.
>>
>> This the systcl parameters I set:
>> # TCP buffer sizes
>> net.ipv4.tcp_rmem = 131072 1048576 2097152
>> net.ipv4.tcp_wmem = 131072 1048576 2097152
>> net.ipv4.tcp_mem = 131072 1048576 2097152
>
> net.core.rmem_default = 1048576
>> net.core.wmem_default = 1048576
>> net.core.rmem_max = 2097152
>> net.core.wmem_max = 2097152
>
> # SACK and timestamps - turn off
>> net.ipv4.tcp_dsack = 0
>> net.ipv4.tcp_sack = 0
>> net.ipv4.tcp_timestamps = 0
>>
>>
> Can you follow up to the list with the previous values. It isn't clear from
> your mail what you increased/decreased.
>
>
The defaults are:
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 196608 262144 393216
net.core.rmem_default = 129024
net.core.wmem_default = 129024
net.core.rmem_max = 131071
net.core.wmem_max = 131071
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1

(basically, the min, starting and max values for tcp window size were all
increased by ~x10 - to the YAIM tuned values for disk servers (which happen
to be a good approximation to tuned for transfers to RAL and CERN) - and
sack was turned off).


Sam

Chris
>
>
>
>> Ok, so that's the what - the why is not so clear.  I was working on the
>> theory that the presence of the NAT boxes represented a network
>> inefficiency, and that if the transfers were given longer then they would
>> compete successfully.  Therefore the approach was to try to optimise the
>> transfers from the worker nodes to CERN, so that if they went a bit quicker,
>> they'd complete before the timeouts.  Note that (at least for us), RAL is 12
>> ms away, and CERN is 27 ms away.  The closer one is to CERN, the smaller
>> effect this change should have (we might well be in the worst case here, at
>> least until UKI-SCOTGRID-SHETLAND gets off the ground).
>> By tuning the worker node for a Long Fat Network, which that sort of
>> connect is, we get more data moved faster.  (Although the target nodes are
>> tuned, TCP/IP is limited by the congestion window on both sides, hence
>> tweaking the worker nodes as well.)  I've been poking at other parameters as
>> well, but the parameters above worked so well that I can't find any
>> differences with any others.  (It's also worth noting that these made no
>> difference in transfers to or from our local SE - i.e. they don't seem to
>> cause any problems even if not useful.)
>>
>> I'd be interested if applying this sort of tuning to worker nodes will
>> have any effect at the other sites that are having transfer problems -
>> Brunel, Liverpool, Sheffield and Bristol.  Also, I'd be interested on the
>> round trip times between the worker nodes and CERN (i.e. through the NAT) -
>> I've been traceroute www.cern.ch, and reading off the last one I can.
>> Raja -  I note that Barcelona and UNIZAR both show similar (although less
>> severe) effects as the UK.  Your opposite number in Spain might be
>> interested in this - certainly I'm curious about their configuration:  I
>> rather suspect they have NAT's and untuned worker nodes.
>>
>>
>>