Hi Mona,
Where's your GridFTP door? Even if you have a door on the pool node, if
you have other doors on other servers dCache will just pick one at
random (or at least is used to) and as far as I could tell in the past
the transfer between the door and the pool is single streamed.
You could confirm if by shutting down all the GridFTP doors bar the one
on the pool node and see if your rates improve.
If that's your problem then there's not much you can do about it until
some fixes dCache.
Yours,
Chris.
> -----Original Message-----
> From: GRIDPP2: Deployment and support of SRM and local
> storage management [mailto:[log in to unmask]] On
> Behalf Of Mona Aggarwal
> Sent: 24 March 2006 17:47
> To: [log in to unmask]
> Subject: dCache Problem at IC
>
> Hi,
>
> At Imperial we are observing low bandwidth for FTS transfers
> from RAL->IC. However the same behaviour is not observed with
> SRMCP.
>
>
> Reason:
> ========
> The cpu io wait on our pool-nodes is very high during the transfers.
> This occurs because of random writes on our disks. We have
> changed the
> parallel streams to (1) within dCacheSetup to avoid random writes.
>
> As there is only one connection at our end it is confirmed
> that we are getting one stream which somehow generates writes
> that aren't continuous.
> It could be that more than one streams get merged to one and
> then transfers take place. However we don't understand how this
> can happen?
>
> File transfer Example:
> ------------------------
>
> Test1:
> -----
>
> http://wiki.gridpp.ac.uk/wiki/IC-HEP#Transfer_Tests_2006-02-22
>
> Test2:
> -----
>
> Parameters set for SRMCP & FTS:
>
> No of parallel streams: 1
> No of files: 1
>
>
> Link to (disk writes) graphs for FTS and SRMCP:
>
> http://www.hep.ph.ic.ac.uk/~georgiou/dcache_fts/
>
> Here is a strace for some of the writes:
>
> 30181 1141231403.128835 write(5,
> "\353\345\366P\276*E\333;\276\216H\317~\t\311\234\270\201"...,
> 10240) =
> 10240 30181 1141231403.129057 lseek(5, 541265920, SEEK_SET) =
> 541265920
> 30181 1141231403.129106 write(5,
> "V\10\206S\232dg\255\342\356\276~\324\20\232\7\341!\21q"...,
> 10240) = 10240
> 30181 1141231403.129322 lseek(5, 541317120, SEEK_SET) =
> 541317120 30181
> 1141231403.129368 write(5,
> "\237b\21\370dw\314\373\342\254\260\'\31\252\255\4*\347"...,
> 10240) = 10240
> 30181 1141231403.129583 lseek(5, 541368320, SEEK_SET) =
> 541368320 30181
> 1141231403.129628 write(5,
> "d*\346\220\341I\310}\34\321+\6\261\v\240T\236\352\327\372"...
> , 10240) =
> 10240 30181 1141231403.129848 lseek(5, 541419520, SEEK_SET) =
> 541419520
> 30181 1141231403.129895 write(5,
> "\232\246OH\30\211\vT\314-x^.K\212M\342XH;\371%G[\30@r\376"...
> , 10240) =
> 10240 30181 1141231403.130117 lseek(5, 541470720, SEEK_SET) =
> 541470720
> 30181 1141231403.130162 write(5,
> "\353\206_\376\33\333v\223&\2038\201\271c\366\301(\356."...,
> 10240) = 10240
> 30181 1141231403.130379 lseek(5, 541521920, SEEK_SET) =
> 541521920 30181
> 1141231403.130425 write(5,
> "(0Mu\337\325\332?k\362\\\321\371G\324&5(\355\342_T\311"...,
> 10240) = 10240
> 30181 1141231403.130637 lseek(5, 541573120, SEEK_SET) =
> 541573120 30181
> 1141231403.130683 write(5,
> "\364\361z\355A\33\321Q_bO\341\232Yh\343Y.\235?\177\341"...,
> 10240) = 10240
> 30181 1141231403.130898 lseek(5, 541624320, SEEK_SET) =
> 541624320 30181
> 1141231403.130943 write(5,
> "p`<\233*\2\274\276;\230\202\251\370\3\265\245*\210\35\337"...
> , 10240) =
> 10240 30181 1141231403.131167 lseek(5, 541675520, SEEK_SET) =
> 541675520
> 30181 1141231403.131212 write(5,
> "e.\361f\1u\33\334\1\243\301\332}$\213\203\234\316i\241"...,
> 10240) = 10240
> 30181 1141231403.131426 lseek(5, 541726720, SEEK_SET) =
> 541726720 30181
> 1141231403.131471 write(5,
> "KP\322~[\321\366\273\351\306\304q\200<g\2\272\32\250\27"...,
> 10240) = 10240
> 30181 1141231403.131688 lseek(5, 541777920, SEEK_SET) =
> 541777920 30181
> 1141231403.131733 write(5,
> "\323g:WC\266\224\351F\f\217Dl\313\312\235y\272!^\332\325"...,
> 10240) =
> 10240 30181 1141231403.131944 lseek(5, 541829120, SEEK_SET) =
> 541829120
> 30181 1141231403.131989 write(5,
> "\220\267d\21\222\tZ\315\360\356b\304\350\276\16J%4\367"...,
> 10240) = 10240
> 30181 1141231403.132211 lseek(5, 541880320, SEEK_SET) =
> 541880320 30181
> 1141231403.132257 write(5,
> "o\253$b\204\212\366\24\336g\17T\323\311\364a\364\\0)\235"...,
> 10240) =
> 10240 30181 1141231403.132474 lseek(5, 541931520, SEEK_SET) =
> 541931520
> 30181 1141231403.132520 write(5,
> "\32\10!@l\30\320!\347\324\32\31\257\t\360R\326\\\347\373"...,
> 10240) =
> 10240
>
> As you can see dcache writes 10k and then jumps 50k and
> writes again 10k.
>
> The system is in a raid5 system with a 64k stipe and this causes the
> hardware
> to read AND write six times the amount of data to the disk
> (read-modify-write) hence there is no way to merge any writes.
>
> With SRMCP we are getting 450Mb/sec that the firewall
> allows with no high IO load.
>
> Any suggestions to solve this problem?
>
> Thanks & Regards,
>
> Mona & Kostas
>
> ====================================
> Mona Aggarwal
> Tel. (+44) 20 759 47809
> Imperial College London
> High Energy Physics Department
> Prince Consort Road, London, SW7 2BW
> ====================================
>
|