I use a tiny script to make backups that are
Backup arrangements should be simple. An offsite identical copy of a filesystem made with rsync is the only backup I trust, and thanks to the proliferation of cloud servers it's cheap and easy to set up.
While a daily backup is better than nothing, it's much nicer to have a rotating system of backups reaching back in time. One way of doing this is to just keep every daily backup, but this requires ever-increasing storage. It's much better to use an alternating pattern of backup slots, with slot 0 being used on odd days (1, 3, 5, ...), slot 1 used every fourth day (2, 6, 10, ...), slot 3 used every eighth day (4, 12, 20, ...) etc.
This is sometimes called Tower of Hanoi rotation, and it has the nice property that you have a selection of backups with approximately doubling ages. Put another way, there is always a backup that is 1 day old, 2 days old, 4 days old, 8 days old, etc.
How to compute the slot number is an interesting problem with a trivial solution. It's the number of trailing zeroes (or ones) in the binary representation of the day number.
The local day number can be found like this in Python:
>>> import datetime >>> datetime.date.today().toordinal() 736720
plotting the next few days in binary and counting the trailing zeroes gives the desired pattern:
736720 | 10110011110111010000 | 4 |
736721 | 10110011110111010001 | 0 |
736722 | 10110011110111010010 | 1 |
736723 | 10110011110111010011 | 0 |
736724 | 10110011110111010100 | 2 |
736725 | 10110011110111010101 | 0 |
736726 | 10110011110111010110 | 1 |
736727 | 10110011110111010111 | 0 |
So the code to produce this number is:
def day_to_slot(n): return bin(n)[::-1].index('1') print day_to_slot(datetime.date.today().toordinal())
One nice capability of rsync is making hard-links to files that already exist on the remote system. So if there is a file x that's unchanged from Monday to Tuesday, rsync actually keeps one copy of x, with hard-links from the Monday and Tuesday backups. This means that the multiple backup trees all look complete, but actually share data wherever possible.
Given slot numbers in $YESTERDAY and $TODAY, the backup "system" consists of invoking rsync with the appropriate options. The command to back up your home directory is:
rsync -az --link-dest=/data/backup/$YESTERDAY ~ remote:/data/backup/$TODAY
So the entire script looks like this:
#!/bin/bash -e read -r YESTERDAY TODAY <<<$( python -c ' import datetime def day_to_slot(n): return bin(n)[::-1].index("1") n = datetime.date.today().toordinal() print(day_to_slot(n - 1)) print(day_to_slot(n)) ') rsync -avz --link-dest=/data/backup/$YESTERDAY ~ remote:/data/backup/$TODAY ssh remote touch /data/backup/$TODAY
After running for a few days you'll have a remote backup directory that looks like this:
$ ls -ltr total 0 drwxrwxr-x 1 jamesb jamesb 10 Jan 26 08:22 4 drwxrwxr-x 1 jamesb jamesb 10 Jan 28 08:23 1 drwxrwxr-x 1 jamesb jamesb 10 Jan 29 08:24 0 drwxrwxr-x 1 jamesb jamesb 10 Jan 30 08:22 2
Each of the above directories contains a full backup, but because of rsync's --link-dest option, the storage is shared between them. Slot 0 has a full copy, and the other slots contain only the differences:
$ du -sh * 34G 0 563M 1 537M 2 646M 4