Thanks Sam,
> As I mentioned in the Storage meeting last week, during the Christmas
> "hands-off" period I started to have a look at ZFS, inspired by Marcus'
> talk at hepsysman last summer[1]. TBH I'm not very far in to my testing,
> having set up zfs on a SL6 and Centos7 on a pair of retired 24-bay disk
> pools and mainly just mucked about, but I'm liking what I see. It was
> shockingly easy to get working. But I have a few questions, thoughts and
> queries (in my usual rambling style).
>
> First up, I tried to simulate a disk failure by yanking out one of the
> disks from it's bay in my raidz2 setup zpool, but zfs only seems to
> detect that the volume is degraded after a reboot. Is this likely an
> artifact of our raid card hiding disk status? Like Marcus I have a
> non-optimum raid card presenting 22 individual disks rather then a nice
> JBOD. Interestingly if I reboot I have to manually import the volumes
> when they're degraded - that's undesirable behaviour I'll need to find
> the setting to change.
>
>
> It's because zfs only detects stuff if the filesystem is in use - as
> soon as you write or read from the zpool, it should detect the missing
> disk (or, at least, this is the case in our similar testing for a
> similar setup here at Glasgow).
>
> Rebooting doesn't break our zpool with a degraded setup either - did you
> add your volumes by "old style" /dev/sdX names, or by (say) their
> /dev/disk/by-path/XXXpciXXX style entry? We did the latter, as it also
> helps you find the broken disk if one breaks (and should be invariant
> under reboots).
>
Aha, my expectations were all wrong here (I was treating zfs in my mind
like a straight raid card replacement), and I used the old style
/dev/sdX names - so I expect that's why zfs borked on reboot. I'll
rebuild my zpools and my way of thinking about them!
I'll expand on the other points tomorrow.
Cheers!
Matt
>
>
> Secondly, I'm trying to think how best to build a machine to utilise
> ZFS. Our recent purchases have been 36-bay boxes filled with 4TB HDD,
> split into 2 raid-6 volumes. The OS volume is a hundred GB of virtual
> disk split off of one of these volumes. So we're squeezing every drop of
> volume we can out of the bays.
>
> But, unless I can build the nodes to work "natively" on zfs (so the OS
> volume would be a zfs volume) I'm not going to be able to follow this
> model on a zfs box. So I'll need to loose "disk slots" to my OS disks
> (most HBAs can raid 1 for a mirrored system volume) or I'll need a
> "novel" solution for my OS volume (I've lost a week of my life to
> setting up a novel solution on one generation of storage nodes a few
> years back). Any thoughts on this?
>
>
> Ah, so we don't do that with our storage in the first place - system
> disks have always been separate (either entirely separate from the
> controller, or on their own RAID1 set). That said, if you merged the
> volumes into 1 raidz3 set, you'd be ahead 1 disk in the total space
> (which might compensate for having to take one out to boot the OS on)?
> Or there are fiddly schemes involving booting from other media, of course.
>
>
> Thirdly, when building a machine with the intent of zfs-ing the data
> volume would I want to just throw in an HBA to JBOD my disks, or would I
> want to spend a few hundred quid extra for a more featured raid card
> still? My main concern is port speed, the HBA's that I've been shown
> thus far have 6Gb/s ports, compared to 12Gb/s for the more expensive but
> comparable raid cards. Of course digging around it seems many reasonably
> priced raid cards don't actually JBOD well, so I might want to fork out
> more for a posh 12Gb/s HBA - which could come to more the a raid card!
>
>
> Yes, we're basically looking at the cost-saving end of this, so 6GB/s
> HBAs are where the cost saving is.
>
>
> Finally, do people think that zfs is "worth it"? I don't think there are
> any (significant) hardware savings to be seen in using zfs over regular
> raid, which was one of my hopes - but there are many possible savings
> and advantages from the admin side, particularly with respect to data
> integrity. From Marcus' talk both raid and zfs managed to keep the
> 10Gbit NICs full, so in many respects performance differences could be
> considered moot if this is the case for us.
>
>
> There's a small complexity reduction, plus zpools are inherently more
> flexible than most standard filesystems (other than maybe LVM'd ones) -
> so you do gain some online-resizing. Plus, you get the equivalent of
> "RAID7" resilience, if you want it...
>
>
>
> If there are no other subjects to be had for Wednesday's meeting it
> would be nice to have a chat about this to gauge other experiences and
> opinions.
>
>
> I agree, we should have a group discussion on this.
>
> Sam
>
>
> Thanks all!
> Matt
>
|