My application base: Obnam

It is often said, yet too often forgotten: taking backups (and verifying that they work). Taking backups is not purely for companies and organizations. Individuals should also take backups to ensure that, in case of errors or calamities, the all important files are readily recoverable.

For backing up files and directories, I personally use obnam, after playing around with Bacula and attic. Bacula is more meant for large distributed environments (although I also tend to use obnam for my server infrastructure) and was too complex for my taste. The choice between obnam and attic is even more personally-oriented.

I found attic to be faster, but with a small supporting community. Obnam was slower, but seems to have a more active community which I find important for infrastructure that is meant to live quite long (you don't want to switch backup solutions every year). I also found it pretty easy to work with, and to restore files back, and Gentoo provides the app-backup/obnam package.

I think both are decent solutions, so I had to make one choice and ended up with obnam. So, how does it work?

Configuring what to backup

The basic configuration file for obnam is /etc/obnam.conf. Inside this file, I tell which directories need to be backed up, as well as which subdirectories or files (through expressions) can be left alone. For instance, I don't want obnam to backup ISO files as those have been downloaded anyway.

[config]
repository = /srv/backup
root = /root, /etc, /var/lib/portage, /srv/virt/gentoo, /home
exclude = \.img$, \.iso$, /home/[^/]*/Development/Centralized/.*
exclude-caches = yes

keep = 8h,14d,10w,12m,10y

The root parameter tells obnam which directories (and subdirectories) to back up. With exclude a particular set of files or directories can be excluded, for instance because these contain downloaded resources (and as such do not need to be inside the backup archives).

Obnam also supports the CACHEDIR.TAG specification, which I use for the various cache directories. With the use of these cache tag files I do not need to update the obnam.conf file with every new cache directory (or software build directory).

The last parameter in the configuration that I want to focus on is the keep parameter. Every time obnam takes a backup, it creates what it calls a new generation. When the backup storage becomes too big, administrators can run obnam forget to drop generations. The keep parameter informs obnam which generations can be removed and which ones can be kept.

In my case, I want to keep one backup per hour for the last 8 hours (I normally take one backup per day, but during some development sprees or photo manipulations I back up multiple times), one per day for the last two weeks, one per week for the last 10 weeks, one per month for the last 12 months and one per year for the last 10 years.

Obnam will clean up only when obnam forget is executed. As storage is cheap, and the performance of obnam is sufficient for me, I do not need to call this very often.

Backing up and restoring files

My backup strategy is to backup to an external disk, and then synchronize this disk with a personal backup server somewhere else. This backup server runs no other software beyond OpenSSH (to allow secure transfer of the backups) and both the backup server disks and the external disk is LUKS encrypted. Considering that I don't have government secrets I opted not to encrypt the backup files themselves, but Obnam does support that (through GnuPG).

All backup enabled systems use cron jobs which execute obnam backup to take the backup, and use rsync to synchronize the finished backup with the backup server. If I need to restore a file, I use obnam ls to see which file(s) I need to restore (add in a --generation= to list the files of a different backup generation than the last one).

Then, the command to restore is:

~# obnam restore --to=/var/restore /home/swift/Images/Processing/*.NCF

Or I can restore immediately to the directory again:

~# obnam restore --to=/home/swift/Images/Processing /home/swift/Images/Processing/*.NCF

To support multiple clients, obnam by default identifies each client through the hostname. It is possible to use different names, but hostnames tend to be a common best practice which I don't deviate from either. Obnam is able to share blocks between clients (it is not mandatory, but supported nonetheless).