Hi,
> Did anyone check that srmCopy worked as expected? We still don't know if
> the problem is caused by whatever FTS uses by default to copy data or
> something else inside FTS. Do you know if Mona collected iostat and
> other usefull traces during the time?
No, as far as I know Mona did not collect the traces. This is something
that we should do next week if we get a chance to test with the new FTS
server.
> 188Mb/s seems a bit slow to me, how many concurent files you were
> transfering at a time? For phedex (srmcp) we get the best results
> with >8 files.
Initially started with 7 concurrent files, but I think I also tried
ramping up to 25 during the transfer.
> That the speed was higher doesn't necessarily mean that everything was OK the
> network path is shared by other people after all and dcache could have been
> under load from other people as well (actually the CMS people complained that
> their transfers were slow during that time so they did affect the measurements
> with their transfers.
Yes this is a good point. It will be interesting to see what happens when
we test again.
> Remember also that the biggest problem is when FTS is used between
> dcache hosts and not between dpm-dcache. Did you test from a dcache
> site as well?
As I said in my last email I tested Edinburgh DPM to IC dCache and Ed
dCache to IC dCache using urlcopy. srmcopy was tested with an Ed dCache to
IC dCache transfer.
> I think that you are confused here, there is no direct copy with srmCopy
> or urlcopy. You always talk to a grdftp door (the one that srm returned)
> to upload the files. I have no idea how srm decides which door to use but
> I suspect that it uses the same calculations as dcache uses to decide which
> pool it will use. After you connect dcache calculates *again* which pool
> to use and depending on the load,free space,etc. it might not be the local
> one. From tests it seems that as long as the load is low and doesn't change
> fast the decision from srm and dcache is the same so you get a "local" copy.
> When the pools are under load it is a lot more likely that you'll end up
> with a different pool than the door. Tuning the transfer costs in dcache
> will probably help there but I never had the time to play with them and
> the dcache documentation is almost non existant :(
I agree that the dCache docs are a bit sparse in this area.
> I've seen this when phedex hits hard our disks (> 400Mb/sec, > 10 files)
> with srmCopy and we *do* get inter-node traffic at the time, scmCopy doesn't
> give you direct transfers and it doesn't reduce inter-node traffic. You can
> say that FTS with the default copy method increases inter-node traffic
> because it causes high load but this is a different matter...
Hmm, I'm not sure about this. I have always gone with what it says on p18
of Timur's (the dCache SRM developer) talk:
http://www.dcache.org/manuals/dcache-workshop-Sep-2005/dcache-workshop-Sep-2005-timur-srm.pdf
This states that srmCopy operates by transferring data directly from
source dCache pool to destination dCache pool. A GridFTP door is involved,
but only to initiate a control channel with the GridFTP client that is
started on the destination pool node. This is how srmcp operates and I
assume is how FTS performs the transfer.
If you still think this is not the behaviour you are seeing at IC then I
would be interested in finding out more.
> No files are attached :( I would really like to have a look at them since
> I was away on holidays during that time.
Sorry about that, was in a bit of a rush yesterday. Files should now be
attached.
We should speak further next week about running some more tests to try and
solve this problem.
Cheers,
Greig
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
=======================================================================
|