As someone who is currently trying to recover files from a RAID5 array
that lost a second disk before it could finish rebuilding to a hot
spare, I would certainly recommend RAID6.
I mostly use 3ware (aka LSI) hardware RAID cards, and set them to
auto-verify for early every morning. Some IT professionals around LBL
tell me I'm being "mean" to my disks, but if some sector has gone bad
I'd rather know sooner than later. But always remember that RAID is not
a backup, it is just a way to loose one drive without loosing data. If
you're lucky.
To be honest, I have never really understood what a "resource fork" is
good for, other than making copying stuff to non-HFS file systems more
confusing. But it is definitely true that scanning files by checking
their "data" only won't trigger if something other than the file data is
corrupted. To really check for media problems, I recommend doing a "dd"
of the entire device to of=/dev/null and watch the syslog for errors.
Doing the whole disk at once keeps the cache from hiding bad sectors
from you.
Pretty much all of my big volumes are XFS, and I've been much more
pleased with it than any other fs, such as ext2/3 or ReiserFS. I've
never tried jfs, but heard good things about it. XFS, however, is
definitely fast, scales very well (because it uses B-trees), you can
defrag it while mounted. Yes, theoretically you don't have to defrag,
but I like to do it once a week anyway because I have gone through the
experience of having to recover files from device with a nuked
superblock, and it is MUCH easier if the files are all on contiguous
blocks (defragged). True, bttrfs, gfs, and others look promising. They
have been looking promising for a while. But all the rampant warning
labels on them have given me pause. For compression, especially
archival compression I do like squashFS (as Scott already mentioned).
The cool part of squashFS is that the network traffic is compressed, as
the decompression is done on the machine that has NFS access to the
squashFS file. I do not use squashFS as a backup, however, it is simply
a convenient way to keep things on spinning disk.
For backup, starting around 2000 I used to make two DVDs, for
redundancy. But in 2007 I did a retrospective analysis and found that
if you can't read one DVD the chances of not being able to read the
other DVD written with the same drive on the same day with the files in
the same order and from the same stack of media, ... is > 50%. So, I've
still got 3000 images out of ~5 million that I can't recover. The main
failure mode of DVDs is not actually scratching but warping.
Particularly if you store them in flip-folders. If they warp, then most
DVD drives can still keep the data pits in focus by servoing the read
head as the disk spins. Different model drives have different
effectiveness at doing this, but if the degree of warping is high no
drive will be able to read the files on the outside tracks of the disk.
The solution is to put the DVD under at least volumes A, B, and C of the
International Tables of Crystallography for a few weeks to a few
months. Particularly in a hot room. Then you can usually read them
again. Those "scratch remover" devices have only ever made things worse
(in my hands).
This experience taught me that having "orthogonal" failure modes is
important. That is, try to store your two backups as differently as
possible. Different media, different locations, different timing. Yes,
timing, don't do both your backups at the same time because if you make
a mistake with one you are very likely to make the same mistake with the
other. Currently, I use one DVD and one LTO4 tape. Both systems are
automated by different scripts and maintained by different people (me
and George Meigs).
I also keep byte-for-byte copies of critical system disks on offline
drives. That way they can be replaced quickly. This isn't foolproof,
of course, because a drive sitting on the shelf can still suffer from a
bearing lock-up, but the failure rate is lower than if the drive is hot
and spinning. I suppose I should spin these offline drives up every
once in a while for good measure. Its also a good idea to burn a DVD or
BLU-Ray image of really important system drives, but its arguably
unnecessary to do that every day. First and foremost, I maintain a
written "back from the dead" procedure for critical machines. This
usually starts with "install Centos x.y" followed by yum commands, wget
urls. etc. Always a good idea to keep your own copy of installation
tarballs (and back that up too). I try to make sure I edit these "back
from the dead" documents every time I make configuration changes, so
they reflect each machine's current state. Yes, I know there are tons
of configuration management packages out there, but to me these are just
one more bit of software to install and keep track of versions. Maybe
I'm just a Luddite that way.
Oh yes, and always print out a hard copy of your cell phone contacts.
Just in case. There's nothing more frustrating than having 100 TB
backed up in duplicate at work when the 10k you really need is lost
forever. Ever try to get files from your phone using the "iCloud"
website? Not as easy as you'd think.
It's true that corruption in X-ray image files is much easier to detect
in "flat" files. I wrote a little routine to look for an inordinate
amount of zeroes, which is the default byte you get from failing
devices. The number of false positives (usually direct-beam hits or bad
darks) is small enough to inspect manually. This won't work on images
without a pixel pedestal though, such as those from a Pilatus, but those
are internally compressed anyway, so looking for long runs of zeroes
could again be useful.
At the risk of making data recovery more problematic, I think the
long-run solution for X-ray images is 3D file formats. That is, x,y and
"time" or whatever other coordinate is appropriate to stack the images.
This is because with increasing framing rates the file creation overhead
in the kernel of most operating systems seems to have become
rate-limiting. A silly limitation? To be honest most data processing
programs are now smart enough to let you select out undesirable images
without having to resort to deleting or re-naming the files on disk. So
why not instead of a directory with 36,000 files in it you have one big
"movie"? I think imgCIF supports that, doesn't it?
-James Holton
MAD Scientist
On 10/21/2015 12:03 PM, William G. Scott wrote:
> Dear CCP4 Citizenry:
>
> I’m worried about medium to long-term data storage and integrity. At the moment, our lab uses mostly HFS+ formatted filesystems on our disks, which is the OS X default. HFS+ always struck me as somewhat fragile, and resource forks at best are a (seemingly needless) headache, at least as far as crystallography datasets go. (True, you can do HFS-compression and losslessly shrink your images by a factor of 2, or shrink your ccp4 installation, but these are fairly minor advantages).
>
> I read the CCP4 wiki page http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Filesystems that summarizes some of the other options. From what I have read, there and elsewhere, it seems like zfs and btrfs might be significantly better alternatives to HFS+, but I really would like to get a sense of what others have experienced with these, or other, equally or more robust options. I don’t feel like I know enough to critically evaluate the information.
>
> Anyone know what the NSA uses?
>
> I recently created a de novo backup of some personal data on an external HFS+ drive (photos, movies, music, etc). I was very unpleasantly surprised to find several files had been silently corrupted. (In the case of a movie file, for example, the file would play but could not be copied. In another case, a music file would not copy, yet it had identical md5sum and sha1 checksums when compared to an uncorrupted redundant backup I had. I’m still puzzled by this, but it suggests the resource fork might be the source of the corruption, and, more worrisome still, that conventional checksums aren’t detecting some silently corrupted data, so I am not even sure if zfs self-healing would be the answer.)
>
> Since we as a community are now encouraging primary X-ray diffraction images to be stored, I can only imagine the problem could be ubiquitous, and a discussion might be worth having. (I apologize if this has been addressed previously; I did search the archive.)
>
> All the best,
>
> Bill
>
>
>
> William G. Scott
> Director, Program in Biochemistry and Molecular Biology
> Professor, Department of Chemistry and Biochemistry
> and The Center for the Molecular Biology of RNA
> University of California at Santa Cruz
> Santa Cruz, California 95064
> USA
>
|