JISCMail - GRIDPP-STORAGE Archives

On Tue, Jul 19, 2005 at 05:44:18PM +0100, Owen Synge wrote:

> I think the disk thrashing issue is worth suggesting to others as a performance hit as the experiments are trying to break things to find out what breaks under what circumstances and then I think trying to find out what is the best way to work with the software stack
> 

Well i have parallel streams set to 1 for srm and it seemed to work fine for the
Phedex transfers from RAL (~480/Mbit/sec). Somehow the SC3 transfers that Derek is
running at the moment use something between 2 and 5 streams (from my strace logs)
the end result is that we haven't managed to get more than ~80Mbit/sec :(

From the strace logs it looks like each thread in d-cache writes it's own stream
as it arrives instead of merging everything back in a buffer resulting in writes
like:

lseek(23, 106792960, SEEK_SET)          = 106792960
write(23, ..., 10240) = 10240
..
lseek(23, 106844160, SEEK_SET)          = 106844160
write(23, ..., 10240) = 10240

<guesswork>
The OS/Raid controller might be able to merge everything back together before
writing to the disk but with 250 streams that we have at the moment i think it's
unlikely to happen (iostat reports minimal merges compared to writes).

Since we are using RAID5 for the disks with a stripe of 64K the non merged
10K writes result in partial stripe write which causes Read-Modify-Write
operations slowing down everything even more :(
</guesswork>

Too bad there is no source available to play with different settings :(
I'll boot one of the pool nodes with the Anticipatory elevator which might
be able to do better than the other ones but i don't expect it to do much
difference :(

Cheers,
Kostas