Interesting and simple way to test the write performance. Simultaneous
writes could then be tested by putting an ampersand ('&') at the end of
the 'dd' command, couldn't they? And if you get tired of typing all the
number, you could use the 'seq' command instead.
Cheers, Tim
> /bin/tcsh
> set time
> foreach file ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 )
> dd if=/dev/zero bs=2G count=1 of=/home/username/deleteme$file
> end
>
> -James Holton
> MAD Scientist
>
>
> Harry M. Greenblatt wrote:
>> BS"D
>>
>> To those hardware oriented:
>>
>> We have a compute cluster with 23 nodes (dual socket, dual core Intel
>> servers). Users run simulation jobs on the nodes from the head node. At
>> the end of each simulation, a result file is compressed to 2GB, and copied
>> to the file server for the cluster (not the head node) via NFS. Each node
>> is connected via a Gigabit line to a switch. The file server has a 4-link
>> aggregated Ethernet trunk (4Gb/S) to the switch. The file server also has
>> two sockets, with Dual Core Xeon 2.1GHz CPU's and 4 GB of memory, running
>> RH4. There are two raid arrays (RAID 5), each consisting of 8x500GB SATA
>> II WD server drives, with one file system on each. The raid cards are AMCC
>> 3WARE 9550 and 9650SE (PCI-Express) with 256 MB of cache memory .
>> When several (~10) jobs finish at once, and the nodes start copying the
>> compressed file to the file server, the load on the file server gets very
>> high (~10), and the users whose home directory are on the file server
>> cannot work at their stations. Using nmon to locate the bottleneck, it
>> appears that disk I/O is the problem. But the numbers being reported are a
>> bit strange. It reports a throughput of only about 50MB/s, and claims the
>> "disk" is 100% busy. These raid cards should give throughput in the
>> several hundred MB/s range, especially the 9650 which is rated at 600MB/s
>> RAID 6 write (and we have RAID 5).
>>
>> 1) Is there a more friendly system load monitoring tool we can use?
>>
>> 2) The users may be able to stagger the output schedule of their jobs, but
>> based on the numbers, we get the feeling the RAID arrays are not performing
>> as they should. Any suggestions?
>>
>> Thanks
>>
>> Harry
>>
>>
>> -------------------------------------------------------------------------
>>
>> Harry M. Greenblatt
>>
>> Staff Scientist
>>
>> Dept of Structural Biology [log in to unmask]
>> <mailto:[log in to unmask]>
>>
>> Weizmann Institute of Science Phone: 972-8-934-3625
>>
>> Rehovot, 76100 Facsimile: 972-8-934-4159
>>
>> Israel
>>
>>
>
|