Vaults and Branches
Keith Lofstrom
Banks, Vaults, Branches, and Images are confusing. Let's try an explanation.
Banks have nothing to do with clients, really. They are just storage areas on the backup machine where vaults can be kept. They could represent separate hard drives, partitions on the same hard drive, or subdirectories of one partition. They just need to be the area where some of the vaults (and /only/ vaults) are stored. If you have very large vaults, a bank might be a striped array of disks presented as an LVM partition, with the one partition containing a path to one vault:
bank:
/media/dirvish1
/media/dirvish2... where /media/dirvish1 might be many disks combined into one partition with LVM, and containing one huge vault, and /media/dirvish2 a single partition on one disk containing many smaller vaults.
Vaults are where the real work gets done. They are not the filesystem, just a repository for images. Vaults are constrained to fit in one place on one partition (so the files in the images can be hardlinked together). I typically set up vaults that store the backup data from one partition on one machine, but a vault may contain just part of a partition, or similar partitions from many machines. Again, vault does not have to bear any exact resemblance to a data structure on the backup client - it is a data structure on the backup server.
Vault names should be globally unique because they aren't qualified by the name of the bank they happen to be in. You can move a vault from one bank to another - perhaps because the partition filled up - and dirvish will continue to find the vault. For example, don't call a vault 'home', call it 'myhost-home' or similar.
Vaults contain images, which superficially are complete separate snapshots of backup data for the specified area on the target. In actual fact, the images are usually heavily hardlinked, where files in different images share the same data, greatly reducing disk space.
The links may connect image files from different backup times, or (initially at least) between two clients at the same time (a branch). The branch paradigm allows two or more clients with nearly identical static data ( say /usr ) to be linked together for the first backup, saving backup storage space. If you are maintaining a thousand user machines with identical non-user partitions, this can save a lot of backup storage space.
The dirvish documentation isn't clear about setting up branches. Let's invent an example. We have a server named Mom, and three clients named Tom, Dick, and Jane. Tom is a Mac, while Dick and Jane are two nearly identical Redhat-9 laptops (with different user data, of course). Backup images from /usr on Tom is stored in its own vault, tomusr, while backup images from Dick and Jane are stored together in another vault, rh9usr. All vaults fit into one bank on one partition on one disk on Mom. The bank is just the directory name /media/backup1, which contains the vaults . For our example, are two vaults stored in this bank, as /media/backup1/tomusr and /media/backup1/rh9usr . The two vaults contain three per-vault dirvish configuration files for the different /usr partitions, as follows:
/media/backup1/tomusr/dirvish/default.conf: client: tom tree: /usr xdev: 1 index: gzip image-default: %Y-%m%d /media/backup1/rh9usr/dirvish/dick.conf: client: dick tree: /usr xdev: 1 index: gzip image-default: dick-%Y-%m%d /media/backup1/rh9usr/dirvish/jane.conf: client: jane tree: /usr xdev: 1 index: gzip image-default: jane-%Y-%m%d
The first vault stands alone, the second two will be used together. We can initialize three images into two vaults like so:
/usr/local/sbin/dirvish --init --vault tomusr /usr/local/sbin/dirvish --init --branch rh9usr:dick /usr/local/sbin/dirvish --reference dick --branch rh9usr:jane
Daily backups can use these lines in /etc/dirvish/master.conf:
Runall:
...
tomusr 03:00
...
rh9usr 03:00
...The daily images will be stored as:
/media/backup1/tomusr/... # the vault for tom:/usr /media/backup1/tomusr/dirvish/... # tomusr config files /media/backup1/tomusr/2005-0315/... # a nightly image /media/backup1/tomusr/2005-0315/exclude # concatenated exclude list /media/backup1/tomusr/2005-0315/index.gz # a list of all the files /media/backup1/tomusr/2005-0315/log # detailed log of rsync events /media/backup1/tomusr/2005-0315/summary # summary of the backup /media/backup1/tomusr/2005-0315/tree/... # the files themselves /media/backup1/tomusr/2005-0315/tree/bin/... # /usr/bin/... /media/backup1/tomusr/2005-0315/tree/dict/... # /usr/dict/... ... /media/backup1/tomusr/2005-0316/... # the next nightly image ... /media/backup1/tomusr/2005-0317/... # the next nightly image ... /media/backup1/rh9usr/... # the shared vault for dick:/usr . # and jane:/usr /media/backup1/rh9usr/dirvish/... # rh9usr vault config files /media/backup1/rh9usr/dick-2005-0315/... # a nightly image of branch dick:/usr /media/backup1/rh9usr/jane-2005-0315/... # a nightly image of branch jane:/usr /media/backup1/rh9usr/dick-2005-0316/... # next nightly image of dick:/usr /media/backup1/rh9usr/jane-2005-0316/... # next nightly image of jane:/usr
Over time, the two separate rh9 images may diverge from the original image, and from each other. For example, we might change the distro from Redhat 9 to Debian; massive numbers of files will change on each client. The result will be backup images with mostly new files, and a scattering of hard links to the previous image. The sets of new files will be mostly duplicated on the backup drive; dirvish and rsync are not smart enough to know that the new images are mostly similar.
If you do a massive distro change on many similar machines, it might be better to set up a new vault and re-initialize it for the new distro.
It would be nice if we could merge images with programs like FreeDups; unfortunately, a dirvish vault has millions of directory entries, and programs such as freedups do not perform well with data sets of that size.
