On Tue, Jul 19, 2005 at 05:44:18PM +0100, Owen Synge wrote: > I think the disk thrashing issue is worth suggesting to others as a performance hit as the experiments are trying to break things to find out what breaks under what circumstances and then I think trying to find out what is the best way to work with the software stack > Well i have parallel streams set to 1 for srm and it seemed to work fine for the Phedex transfers from RAL (~480/Mbit/sec). Somehow the SC3 transfers that Derek is running at the moment use something between 2 and 5 streams (from my strace logs) the end result is that we haven't managed to get more than ~80Mbit/sec :( From the strace logs it looks like each thread in d-cache writes it's own stream as it arrives instead of merging everything back in a buffer resulting in writes like: lseek(23, 106792960, SEEK_SET) = 106792960 write(23, ..., 10240) = 10240 .. lseek(23, 106844160, SEEK_SET) = 106844160 write(23, ..., 10240) = 10240 <guesswork> The OS/Raid controller might be able to merge everything back together before writing to the disk but with 250 streams that we have at the moment i think it's unlikely to happen (iostat reports minimal merges compared to writes). Since we are using RAID5 for the disks with a stripe of 64K the non merged 10K writes result in partial stripe write which causes Read-Modify-Write operations slowing down everything even more :( </guesswork> Too bad there is no source available to play with different settings :( I'll boot one of the pool nodes with the Anticipatory elevator which might be able to do better than the other ones but i don't expect it to do much difference :( Cheers, Kostas