Dear All,
Question about RAID6:
Assuming 1 disk dies, you replace disk & rebuild starts.
If another disk registers bad/damaged (doubley-degraded, but RAID6 can
sustain 2 disk losses), do you replace the 2nd bad disk before rebuild
due to 1st disk replacement completes, or wait till 1st rebuild completes?
Or, if you replace the 2nd disk before the rebuild is completed, would
the RAID hardware itself "wait" to start another rebuild? ...
(That info is not in RAID array documentation)
Are people enabling background scrubbing on their RAID arrays?
There are quite a few options for crontabs (so to speak) on the RAID
array.
Filesystem creation overhead: usually 20% is the rumour - any safe ways to
minimize it (other than -m option to mke2fs)?
One may juggle RAID config several ways to get same raw total gross space;
to maximise net usable space there might be an optimal strategy.
Question about DPM config:
Since 32-bit linux can't handle a filesystem > 2TB, with any "big" RAID
array it must be chopped up into <= 2TB partitions on which a Linux
filesystem is created.
Is the better LCG/Tier2/DPM-expert advice to make each 2TB filesystem into
a DPM pool, or to make a DPM pool of several 2TB filesystems?
Earlier advice was it's better to have smaller RAID sets = filesystems so
if that smaller RAID set becomes corrupted, damage is localised to that
RAID set = one filesystem. But disk space is sacrificed.
The latter is safer but the former yields more space (the 2TB Linux
filesystem limit is annoying, though).
What are others doing?
Thanks for advice.
PS Very interesting paper on silent data corruption:
https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257
|