On Tue, May 02, 2006 at 03:18:23PM +0100, Owen Synge wrote:
> On Thu, 27 Apr 2006 11:22:30 +0100
> Kostas Georgiou <[log in to unmask]> wrote:
>
> > On Wed, Apr 26, 2006 at 09:16:18PM +0100, Greig A Cowan wrote:
> >
> > > > Performance would have been a lot better if dcache didn't use 10K
> > > > writes something like 256K for example is probably enough to keep
> > > > the writes non random.
> > > >
> > > > http://savannah.cern.ch/bugs/?func=detailitem&item_id=10132
> > >
> > > I realise that you've had an issue with the 10K writes for quite a
> > > while now. No one else (outside of GridPP) appears to have flagged
> > > it as a problem.
> >
> > I hope that by now eveyone here agrees that parallel streams are
> > slower, has anyone asked the question WHY this is the case? Maybe
> > nobody else did so they aren't going to complain about the 10K block
> > writes...
> >
> > Cheers,
> > Kostas
>
> I agree that this is a problem and I should raise it as an issue for
> D-cache, but mature products like D-cache just comply with the specs
> given to them and since GSIFTP compatibility was what was asked for
> that's what we got. The merits of using FTP over HTTP are still a
> mystery to me, their are many many small issues that too me make a
> compelling case, this is an example of an implementation error, and I
> have gone over this issue repeatedly also. Many admins say GSIFTP optimised
> for single file transfers works, job done, and why should we change? The
> problem about changing these things is their is no single killer reason
> to abandon what was regarded as the standard line. The false logic in this
> case is that parallel transfers of a single file are faster and there fore
> all files should be transferred in parallel streams per file. this is
> clearly a leap of faith with no modeling or though put into how things
> rearly work.
>
> Other poor assumptions existed including that by using multiple ports
> things go faster without anyone mentioning that a port is just a TCP/IP
> concept and has no basis in hardware. This is no longer an established
> "truth" which did take a lot of lobbying. I cant help wondering how such
> a story ever consistently reached so many of senior management.
>
> I hope you can present your argument to people like Peter Clark, and
> others at the top of GridPP management, and members of experiment
> boards, as they are the people imputing the requirements for the SC4
> meeting at Fermi very soon and we could keep this false consensus going
> to long unless you make this clear to them that the fastest way to
> transfer files irrespective of protocol is a single stream when multiple
> files are to be transferred. They all know my opinion but at the moment
> they don't know that anyone else agrees that file transfers should be
> single stream. Consensus needs not only to be established between tech
> people but also management needs to know that tech people have reached a
> consensus, I cant help here as they already know my position.
I think we are talking about slightly different things.
1) More parallel tcp transfers (parallel streams or parallel files) will
always get better speed at the network level.
Some of the reasons...
a) The tcp window scaling algorithms do not respond that fast in
changes. Things like net.ipv4.tcp_congestion_control = bic in
newer kernels do help a lot here so in the future this will be
less of an issue.
b) Most systems are not tuned at all for high badnwidth transfers
over WAN. How many machines in lcg do you think have set
net.core.wmem_default for example in a sane value? Last time
I transfered data from the RAL dcache I found that RAL only
uses a 64K window (no replies to that email btw beyond we'll
look at it).
c) Tcp will try to split bandwidth equally between all transfers
so assuming that other people are using the pipe you do get a
bigger percentage the more streams you use. Of course they'll
do the same sooner or later so it is a silly argument but it
does improve benchmarks and not many people care about fairness
to eveyone else using the pipe.
I have no problem with parallel transfers they do give you better
network performance (but this is not everything).
2) Random writes are deadly to a hard disk. An average SATA/IDE
disk can sustain something like 50MB/sec for sequential writes,
when you switch to random writes performance drops to ~1-3 MB/sec
The problem with dcache is not that they use parallel streams
it is that the transfer data in blocks of 10K which combined
with parallel streams shifts the bottleneck from the network
to the disks and ends up slower. It doesn't have to be this
way.
The solution is simple you just increase you block size so your
writes aren't as random anymore and everything is OK again.
globus-url-copy has the -bs option for exactly this reason, have
a look at the following emails and the thread.
http://www-unix.globus.org/mail_archive/discuss/2005/12/msg00273.html
http://www-unix.globus.org/mail_archive/discuss/2005/12/msg00276.html
So my problem is not the parallel streams, it is the naive
implementation in dcache, they just dropped in gsiftp and
parallel streams without thinking. It is quite hard to get
better performance than gsiftp none of the other protocols
support all the features that make it fast but if it is not
done correctly you obviously don't get the performance.
Cheers,
Kostas
|