hi Lorne, Brian
I am also very interested in this issue (didn't have yet time to look into it properly)
we observe in transfers through the PIC fts, from certain T2 to LIP (and NCG) some strange failures
namely:
- (Premise) there is a timeout of 2 hours in the FTS
- some large files (from atlas and cms) in the +-2GB we see that the file completes it's transfer, which means that the full file
is in the destination, I confirmed both the total size and the cksum, the cksum is correctly set in the filesystem (lustre under storm in our case)
in both sides, we have seen that the gftp servers send the end of the transfer to the fts, but somehow the FTS ignored it
(both gftp servers in debug mode)
we saw that this has happened for transfers lasting more then 1 hour (but complete before the 2 h timeout)
note, that we have successful transfers from those T2 if they complete in less then 1 hour
(yes I know this is way strange)
PIC people has even changed our fts chanels to newer versions of FTS to no avail. which seems to indicate that it may not be the fts service itself
we will look into this matter more closely, and specifically the points sugested by Lorne
and have a try with pic people.
cheers
Mario David
On Mar 15, 2012, at 10:20 AM, [log in to unmask] wrote:
> I have been looking into htcp on some servers at the UK Tier1.
> We did not see a major improvement in rates for completed transfers. We cannot say at the moment if the change has increased our percentage of successful transfers.
>
> How did you observe this improvement?
>
> How are you only applying changes to transfers for channels on a single FTS channel?
> I would like more info ( off list) regarding the setup to see if particular disk server to diskserver transfers are faster between SASRA and RAL.
> Thanks.
> Brian
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On Behalf Of Lorne Levinson
> Sent: 12 March 2012 21:29
> To: [log in to unmask]
> Subject: [LCG-ROLLOUT] bugs in TCP congestion algorithms
>
> According to http://fasterdata.es.net/fasterdata/host-tuning/linux/ there are problems with the Linux TCP network congestion algorithms:
> "NOTE: There seem to be bugs in both bic and cubic for a number of versions of the Linux kernel up to version 2.6.33. We recommend using htcp with older kernels to be safe."
> The FTS channel from SARA to Weizmann has been changed to use htcp and indeed throughput has improved and timeouts have been reduced. (The SARA end is 10G but the Weizmann end is only 1G.) Do others have experience with tcp congestion algorithms?
|