Kostas Georgiou wrote:
The plot attached is I think Edinburgh-dcache->IC-dcache using fts (16
may Greig can confirm).
Olivier.
> Hi,
>
> On Thu, Jun 01, 2006 at 07:02:48PM +0100, Greig A Cowan wrote:
>
>>> I would like to understand what was the outcome of the tests you did
>>> with Mona concerning the fts/dcache problem. I remember that the last
>>> test was to set fts to use srm rather than gsiftp third party copy. Was
>>> the iowait ok at that time ?
>> Correct, we configured the STAR-IC FTS channel to use srmCopy rather than
>> 3rd party urlcopy. We observed a few things during the tests:
>>
>> 1. With FTS in urlcopy mode, a 50GB transfer from Edinburgh to IC gave a
>> rate of 138Mb/s. With srmCopy mode, the rate was 188Mb/s so a significant
>> boost in transfer rate was achieved.
>
> Did anyone check that srmCopy worked as expected? We still don't know if
> the problem is caused by whatever FTS uses by default to copy data or
> something else inside FTS. Do you know if Mona collected iostat and
> other usefull traces during the time?
>
> 188Mb/s seems a bit slow to me, how many concurent files you were
> transfering at a time? For phedex (srmcp) we get the best results
> with >8 files.
>
> That the speed was higher doesn't necessarily mean that everything was OK the
> network path is shared by other people after all and dcache could have been
> under load from other people as well (actually the CMS people complained that
> their transfers were slow during that time so they did affect the measurements
> with their transfers.
>
> Remember also that the biggest problem is when FTS is used between
> dcache hosts and not between dpm-dcache. Did you test from a dcache
> site as well?
>
>> 2. The inter-node traffic was significantly reduced when using srmCopy
>> compared to urlcopy. This is because the data is transferred directly to
>> the pool that it will be stored on, rather than being routed to a pool via
>> a gridFTP door.
>
> I think that you are confused here, there is no direct copy with srmCopy
> or urlcopy. You always talk to a grdftp door (the one that srm returned)
> to upload the files. I have no idea how srm decides which door to use but
> I suspect that it uses the same calculations as dcache uses to decide which
> pool it will use. After you connect dcache calculates *again* which pool
> to use and depending on the load,free space,etc. it might not be the local
> one. From tests it seems that as long as the load is low and doesn't change
> fast the decision from srm and dcache is the same so you get a "local" copy.
> When the pools are under load it is a lot more likely that you'll end up
> with a different pool than the door. Tuning the transfer costs in dcache
> will probably help there but I never had the time to play with them and
> the dcache documentation is almost non existant :(
>
> I've seen this when phedex hits hard our disks (> 400Mb/sec, > 10 files)
> with srmCopy and we *do* get inter-node traffic at the time, scmCopy doesn't
> give you direct transfers and it doesn't reduce inter-node traffic. You can
> say that FTS with the default copy method increases inter-node traffic
> because it causes high load but this is a different matter...
>
>> 3. Your disk servers were still showing iowait. Mona grabbed some ganglia
>> screenshots taken during the transfers and I have attached them to this
>> mail.
>>
>> * ED-dCache-DPM-IC-dCache-urlcopy.jpg
>
> No files are attached :( I would really like to have a look at them since
> I was away on holidays during that time.
>
> Cheers,
> Kostas
--
- O. van der Aa - Imperial College London -
- LT2 Technical Coordinator -
- tel: +442075947810, +442071005426 -
- SIP: [log in to unmask] -
- fax: +442078238830 -
- http://surl.se/agtu -
|