On 27/07/12 12:50, Adam Huffman wrote:
> Wahid,
>
> Many thanks for this, which is indeed useful. The scripts you mention
> would also be helpful, if/when they appear.
>
> If I do have further questions I'll get back to you, if I may.
>
I'll send you a copy of what I wrote about Lustre for CHEP.
It is indeed frustrating that we don't have good benchmarks. I'm not
sure what monitoring other sites do, but I don't think we monitor things
as well as we could to get usage patterns either.
Some thoughts:
From what I've heard, Nearline SAS has much better error correction than
SATA, as well as slightly better performance. It's only marginally more
expensive, so I'd choose that.
How you partition your drives makes a difference. The Lustre manual
recommends RAID6, then suggests that best performance can be obtained by
having 8+2 disks to ensure Lustre writes match the underlying hardware.
You also need to make sure that the partitioning is aligned with the
underlying sector size of the disk.
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.html#configuringstorage
At QMUL, we bought Dell R510s with 12*2TB SATA disks a year and a half
ago, and 3TB NL SAS disks in our last purchase a few months ago. All
with 2 internal disks. Whilst this doesn't give the most efficient
Lustre writes, the logic was that we wanted the capacity and buying 10%
more servers gave us 10% more throughput _and_ 10% more capacity, rather
than spending 10% more on a server (if you see what I mean).
Compared with the solution Wahid outlines in his paper, we can't have
failover, but as the price of an R510 server with 12 disks was roughly
the price of an MD1200 disk array with 12 disks, (and you need to buy a
server as well), we got more storage.
One vendor (Dell probably) suggested that as a rule of thumb, you get
about 50MB/s per disk. That's roughly 600MB/s from our 12 disk server -
which we can get in iozone benchmarks. Ideally, one would probably match
that to the network bandwidth, so a server with a few more disks might
give you a better bang per buck.
The other nice thing about the 12 drive servers is that we think we are
further from the limits of the hardware than one is with the supermicro
36 bay servers - though the latter give more flexibility in how you
carve up the RAID arrays.
Some RAID cards have the ability to cache data with ssd. If we believe
that we are doing large block reads, then we don't expect this will add
much - and that for smaller reads, they should be cached in RAM. It
would however be interesting to know.
Chris
> Cheers,
> Adam
>
> On Thu, Jul 26, 2012 at 2:04 PM, Wahid Bhimji
> <[log in to unmask]> wrote:
>>
>> Hello
>>
>> Not exactly what you asked but you may want to look at this whitepaper we wrote with Dell.
>>
>> http://www2.ph.ed.ac.uk/~wbhimji/GridStorage/Dell-DPM-LancsEd-Whitepaper2012.pdf
>>
>> It covers DPM - but some of the tests used would work with any storage.
>> In particular in the appendix there are the iozone options we used and also the full code of a ROOT script to open files (so a more HEP style analysis).
>>
>> (and you could easily replace the rfcp examples with dccp ones I guess)
>>
>> We ran it on ATLAS files but in principle it should run on any ROOT files (flat "ntuples" trivially and complex CMS objects somehow).
>> If there is something you are unsure on that part I would be happy to (try and help) .
>>
>> _Not_ definitive (or necessarily recommended). There is a WLCG working group of which I think I am supposed to be a member I think that is supposed to come up with a more coherent storage benchmarking approach. We haven't had any meetings yet though... but the chair (Dirk) did have some useful scripts used at CERN that I will ask him for and forward on (if he sends me them).
>>
>> Cheers for now
>>
>> Wahid
>>
>> On 26 Jul 2012, at 12:20, Adam Huffman wrote:
>>
>>> Don't think I've posted to the list before, so I should introduce
>>> myself. I've been working with the Imperial HEP group since April,
>>> alongside Simon Fayer.
>>>
>>> We're looking at buying some storage, and I wondered whether other
>>> sites had settled on some (more or less) meaningful benchmarks when
>>> evaluating storage hardware? Having looked in the list archives, the
>>> subject seems to have come up several times, without a definitive
>>> answer beyond the usual suspects (fio, bonnie++, iozone etc.).
>>>
>>> It may well be that there isn't a definitive answer, but I thought I'd
>>> ask anyway.
>>>
>>>
>>> Best Wishes,
>>> Adam Huffman
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
|