hi Jens,
Yeah, I was messing with script parameters!
So, I think I have a working implementation of both compression and incremental backup (which needs some rough-edges filing off, but demonstrates how things work).
Given:
svr003:~/scs/test# ls -l
total 20
-rwxr-xr-x. 1 root root 407 Nov 27 12:01 archive-script
drwxr-xr-x. 2 root root 4096 Nov 26 20:12 dir
-rwxr-xr-x. 1 root root 215 Nov 26 20:12 new-volume
-rwxr-xr-x. 1 root root 227 Nov 27 12:00 new-volume-gz
-rwxr-xr-x. 1 root root 272 Nov 27 11:19 new-volume-gz-2
svr003:~/scs/test# ls -l dir
total 1584
-rw-r--r--. 1 root root 400000 Nov 26 20:12 test1
-rw-r--r--. 1 root root 400000 Nov 26 20:12 test2
-rw-r--r--. 1 root root 400000 Nov 26 20:12 test3
-rw-r--r--. 1 root root 400000 Nov 26 20:12 test4
we have a minimally effective archive script, which just drives tar to do the hard stuff:
svr003:~/scs/test# cat archive-script
#!/bin/bash
basename=archive.tar
target=dir/
chunksize=600
snapshotfile=./snapshot.snar
#start tar process, with default starting volume number (and update to volnum)
rm volnum
tar -c -L $chunksize --volno-file=volnum --listed-incremental=$snapshotfile -F ./new-volume-gz -f ${basename}-1 $target
#and zip the last file written (as new-volume-gz is not called when tar exits)
gzip ${basename}-$(<volnum)
-
The new-volume-gz can be made simpler in this instance, to be:
svr003:~/scs/test# cat new-volume-gz
#!/bin/bash
name=`expr $TAR_ARCHIVE : '\(.*\)-.*'`
#gzip last archive file
last=-$((TAR_VOLUME-1))
gzip ${name:-$TAR_ARCHIVE}$last &
#and echo the new name to the tar process
echo ${name:-$TAR_ARCHIVE}-$TAR_VOLUME >&$TAR_FD
-
Note that this only supports creation of the files (it's a relatively straightforward process to make a version which will also ungzip them on the fly for extraction by tar), as a demonstrator.
Running the archive-script in this mode generates a series of compressed tar files, segmented as before (and individually addressable, for files which are contained wholly in one tar file).
It also generates a snapshot.snar, which stores the metadata for this archive run.
So:
svr003:~/scs/test# ./archive-script
rm: cannot remove `volnum': No such file or directory
svr003:~/scs/test# ls -l
total 40
-rwxr-xr-x. 1 root root 407 Nov 27 12:01 archive-script
-rw-r--r--. 1 root root 836 Nov 27 12:08 archive.tar-1.gz
-rw-r--r--. 1 root root 822 Nov 27 12:08 archive.tar-2.gz
-rw-r--r--. 1 root root 460 Nov 27 12:08 archive.tar-3.gz
drwxr-xr-x. 2 root root 4096 Nov 26 20:12 dir
-rwxr-xr-x. 1 root root 215 Nov 26 20:12 new-volume
-rwxr-xr-x. 1 root root 227 Nov 27 12:00 new-volume-gz
-rwxr-xr-x. 1 root root 272 Nov 27 11:19 new-volume-gz-2
-rw-r--r--. 1 root root 97 Nov 27 12:08 snapshot.snar
-rw-r--r--. 1 root root 2 Nov 27 12:08 volnum
So, now, if we delete the archive.tar segments, but leave the snapshot.snar record in place, and rerun the archive script:
svr003:~/scs/test# rm archive.tar*
svr003:~/scs/test# ./archive-script
svr003:~/scs/test# ls -l
total 32
-rwxr-xr-x. 1 root root 407 Nov 27 12:01 archive-script
-rw-r--r--. 1 root root 148 Nov 27 12:09 archive.tar-1.gz
drwxr-xr-x. 2 root root 4096 Nov 26 20:12 dir
-rwxr-xr-x. 1 root root 215 Nov 26 20:12 new-volume
-rwxr-xr-x. 1 root root 227 Nov 27 12:00 new-volume-gz
-rwxr-xr-x. 1 root root 272 Nov 27 11:19 new-volume-gz-2
-rw-r--r--. 1 root root 97 Nov 27 12:09 snapshot.snar
-rw-r--r--. 1 root root 2 Nov 27 12:09 volnum
We see that there's only one, v small archive made this time (as there were no changes between this archive and the last one). The snapshot.snar is *updated* to contain the new status of files *if any files were written to the new archive*.
So, this archive is a delta against the first one.
If we now touch a file, and run the script again:
svr003:~/scs/test# rm archive.tar-1.gz
svr003:~/scs/test# cd dir
svr003:~/scs/test/dir# touch test1
svr003:~/scs/test/dir# cd ..
svr003:~/scs/test# ./archive-script
svr003:~/scs/test# ls -l
total 32
-rwxr-xr-x. 1 root root 407 Nov 27 12:01 archive-script
-rw-r--r--. 1 root root 589 Nov 27 12:11 archive.tar-1.gz
drwxr-xr-x. 2 root root 4096 Nov 26 20:12 dir
-rwxr-xr-x. 1 root root 215 Nov 26 20:12 new-volume
-rwxr-xr-x. 1 root root 227 Nov 27 12:00 new-volume-gz
-rwxr-xr-x. 1 root root 272 Nov 27 11:19 new-volume-gz-2
-rw-r--r--. 1 root root 97 Nov 27 12:11 snapshot.snar
-rw-r--r--. 1 root root 2 Nov 27 12:11 volnum
The archive is still small, but it's larger than before - it now contains just the touched file, as its modification time is different to that recorded in the snapshot.snar file.
Note that the documentation on GNU Tar here: http://www.gnu.org/software/tar/manual/html_node/Multi_002dVolume-Archives.html
is very useful (and explicitly tells you how to extract files from multi-volume archives in the cases where the file is a) wholly in a single part of the archive, b) spread across several). The instructions are essentially what I wrote in a previous email in this conversation.
Hope this is helpful,
Sam
________________________________________
From: Jensen, Jens (STFC,RAL,SC) [[log in to unmask]]
Sent: 27 November 2015 10:51
To: Samuel Skipsey; [log in to unmask]
Subject: Re: FW: Rethinking backups?
Hi Sam,
Thanks for investigating this. The "-.gz" is an accident from a previous
experiment, I presume.
The compression isn't hugely important unless it'll make the transfers
go faster - maybe worth checking what the compression ratio will be in
practice - because data is compressed when it goes to tape. So for less
than, say, 10%, I wouldn't worry.
My docs (aka the man page) say kB:
-L, --tape-length NUMBER
change tape after writing NUMBER x 1024 bytes
However, the archives are not necessarily individually extractable - let
us do a bit more testing:
jensen@ganesha[2]35% tar tvf archive.tar-5
M--------- 0/0 1536 1970-01-01 01:00 test/foo66--Continued at
byte 8704--
-rw-r--r-- jensen/esc 10240 2015-11-27 10:38 test/foo32
-rw-r--r-- jensen/esc 10240 2015-11-27 10:38 test/foo97
-rw-r--r-- jensen/esc 10240 2015-11-27 10:38 test/foo90
-rw-r--r-- jensen/esc 10240 2015-11-27 10:38 test/foo87
-rw-r--r-- jensen/esc 10240 2015-11-27 10:38 test/foo48
jensen@ganesha[2]36% rm -r test
jensen@ganesha[2]37% tar xf archive.tar-5 test/foo90
tar: test/foo66: Cannot extract -- file is continued from another volume
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
[1] 24337 exit 2 tar xf archive.tar-5 test/foo90
... It complains, but it /does /extract the file! So let's try foo66
which spans archive.tar-4 and archive.tar-5
jensen@ganesha[2]51% tar xf archive.tar-4 test/foo66
tar: test/foo100: Cannot extract -- file is continued from another volume
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
[1] 24516 exit 2 tar xf archive.tar-4 test/foo66
jensen@ganesha[2]52% ls -l test
total 24
-rw-r--r-- 1 jensen esc 8704 Nov 27 10:46 foo66
-rw-r--r-- 1 jensen esc 10240 Nov 27 10:38 foo90
jensen@ganesha[2]53% mv test/foo66 test/foo66.tmp
jensen@ganesha[2]54% tar xf archive.tar-5 test/foo66
tar: test/foo66: Cannot extract -- file is continued from another volume
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
[1] 24541 exit 2 tar xf archive.tar-5 test/foo66
... but this actually doesn't do anything; it doesn't extract the
remaining fragment of foo66 :-(
Of corse this would work:
jensen@ganesha[2]55% tar tMfF archive.tar ~/new-volume test/foo66
test/foo66
So this approach may still not be the right one; you generally still
have to recover all the pieces rather than just the one with your stuff
on it.
Cheers
--jens
|