skip to main content.

posts about backup.

last summer, after buying a new four terabyte harddisk for my main computer (replacing the old and notoriously full one terabyte harddisk), i wanted to try something new. instead of using ext2/3/4, i decided to switch to the btrfs filesystem. the main feature why i wanted to use btrfs was the ability to quickly create snapshots of the current disk content on the fly, thus being able to browse through how the disk looked some time ago. the snapshots are essentially only the difference between the old data and the new, thus they are essentially free if the disk content isn’t changing a lot between the snapshots. which, at least for me, is usually the case.
i’m using btrfs only for the /home partition, to which i added a subdirectory /home/backup to store backups. in this post, i want to explain how to set up a simple system which makes a snapshot every ten minutes, and cleans up older snapshots so that

  • for snapshots older than a day, only one snapshot is left for every hour, and
  • for snapshots older than a week, only one snapshot is left for every day, and
  • for snapshots older than a year, only one snapshot is left for every month.

so even with a lot of changes inbetween, the number of snapshots shouldn’t be too big, and thus not too much space will be wasted, while still allowing to access old (and deleted!) data. note that changing the interval from every ten to, say, every minute should be no problem. if you ever accidently delete something, you’ll have no problem to resurrect the file even if you only notice some hours, days, weeks or even months later. (providing that the file has already been around for at least a similar time interval.)

one note regarding btrfs in general. while btrfs is still marked experimental, it seems to be pretty stable in practice. the only caveat is that you should never fill btrfs disks too much. always make sure enough space is left. that shouldn’t be a problem for my four terabyte disk for quite some time, but in case you love to quickly fill space, better get more than one drive and join them (via raid zero or something like that). also, note that one btrfs filesystem can span over several partitions and disks, and that it can internally do several raid modes. in fact, that’s something i want to try out soon, by combining a bunch of older harddisks i’ve still lying around in a jbod array and putting a raid one btrfs filesystem over all of them. note that btrfs will in the future allow to configure this even more refined (like increasing redundancy, or also using different configurations per file), and that it’s always possible to update a filesystem on the fly while it is mounted.

creating read-only snapshots.

creating a read-only snapshot is simple: just run btrfs subvolume snapshot -r /home /home/backup/name_of_snapshot. (if you want snapshots you can also write to, drop the -r.) for example, you could create a little shell script:

1 #!/bin/bash
2 TIMESTAMP=`date +"%Y-%m-%d-%H%M%S"`
3 btrfs subvolume snapshot -r /home /home/backup/$TIMESTAMP
4 rm -rf /home/backup/$TIMESTAMP/backup/20*

this creates a read-only snapshot based on the current date, and cleans up the /backup subdirectory of /home/backup in the snapshot. after all, we don’t want to recursively increase the tree’s depth by having links to all older snapshots in each snapshot.

setting up your computer to execute this script regularly is quite simple. let’s say it is stored as /home/backup/snapshot.sh with read and execution priviledges for root; then you could run crontab -e as root and add a line like
1,11,21,31,41,51 * * * * root /bin/bash -c "/home/backup/snapshot.sh &>> /var/log/snapshot.log"
this runs the script at xx:01, xx:11, xx:21, xx:31, xx:41 and xx:51 for every hour xx on every day during the whole year. the script’s output (which should be essentially something like Create a snapshot of '/home' in '/home/backup/2014-04-27-000100') is stored in a log file /var/log/snapshot.log.

cleaning up.

cleaning up is a little more complicated. deleting a snapshot itself is easy: just run btrfs subvolume delete /home/backup/name_of_snapshot. to delete snapshots according to the rules i wrote up above, i wrote a little python script:

 1 #!/usr/bin/python2
 2 import os, os.path, datetime, subprocess
 3 
 4 class CannotParse(Exception):
 5     pass
 6 
 7 # Find all directories in /home/backup
 8 now = datetime.datetime.now()
 9 td_day = datetime.timedelta(days=1)
10 td_week = datetime.timedelta(weeks=1)
11 td_month = datetime.timedelta(days=31)
12 monthold = dict()
13 weekold = dict()
14 dayold = dict()
15 rest = dict()
16 for file in os.listdir('/home/backup'):
17     if not os.path.isfile(file):
18         # Interpret name as timestamp
19         data = file.split('-')
20         try:
21             if len(data) == 4:
22                 year = int(data[0])
23                 month = int(data[1])
24                 day = int(data[2])
25                 if len(data[3]) == 4:
26                     hour = int(data[3][0:2])
27                     minute = int(data[3][2:4])
28                     second = 0
29                 elif len(data[3]) == 6:
30                     hour = int(data[3][0:2])
31                     minute = int(data[3][2:4])
32                     second = int(data[3][4:6])
33                 else:
34                     raise CannotParse()
35                 timestamp = datetime.datetime(year, month, day, hour, minute, second)
36                 isodate = timestamp.isocalendar() + (hour, minute, second)
37             else:
38                 raise CannotParse()
39             
40             age = now - timestamp
41             if age >= td_month:
42                 id = isodate[0:2]
43                 d = monthold
44             elif age >= td_week:
45                 id = isodate[0:3]
46                 d = weekold
47             elif age >= td_day:
48                 id = isodate[0:4]
49                 d = dayold
50             else:
51                 id = isodate[0:6]
52                 d = rest
53             if id not in d:
54                 d[id] = list()
55             d[id].append([timestamp, file])
56         except Exception:
57             pass
58 
59 def work(d, title):
60     for id in d:
61         list = d[id]
62         list.sort()
63         if len(list) > 1:
64             for v in list[1:]:
65                 retcode = subprocess.call(['btrfs', 'subvolume', 'delete', '/home/backup/' + str(v[1])])
66                 if retcode != 0:
67                     print 'Error! (Return code ' + str(retcode) + ')'
68 
69 work(monthold, "MONTH OLD:")
70 work(weekold, "WEEK OLD:")
71 work(dayold, "DAY OLD:")
72 work(rest, "REST:")

i stored it as /home/backup/cleanup.py and made it runnable by root, and scheduled it to be run every hour at a fixed minute offset (say, xx:59) by running crontab -e and adding
59 * * * * root /bin/bash -c "/home/backup/cleanup.py &>> /var/log/snapshot.log"
again, the output is put into /var/log/snapshot.log.

posted in: computer
tags:
places:

here’s my project 52 shot for the eighth week. the topic was

technik, freund oder feind?

i know, this photo comes a bit late, but i had to wait for a new gadget to do this photo. the idea is already a few weeks old :-) harddisks are a piece of computer equipment which can be friend or foe: they store your data, your photos, your videos, your music, for a long time, fast to access. but sometimes, they suffer from head crashes or other fatal injuries, and your data, your photos, your whatever is gone. sometimes it can be recovered, sometimes not. essentially everyone who has been working with computers had a harddisk failure somewhen, i’m sure. i had several. some of them really kicked me in the ass, since i didn’t had a current backup. (this is also the reason why i don’t have any photos from a longer period many years ago…) nowadays, i have a pretty redundant system of backups, so this shouldn’t happen anymore. but who knows… there is no absolute error tolerance. the disk in the photo is pretty much dead, not only since it is open, but already covered by lots of dust, in just a short period of time. dust on the platter is an absolute killer…
please click the photo to get a larger version:

technical details: 15s, f/45, 105mm, iso 200.

i got another external hard drive today. the main reason is that i want to encrypt my (current) backup harddisk, which requires reformatting the disk. but if i do so, i’m left with nothing but the original data on the laptop, and no backup. in case something goes terribly wrong, i’m screwed. i just created an encrypted partition on the disk; this is really pretty easy and not much command line typing is required, in particular if everything is set up: then linux will ask me for the password as soon as i plug the usb cable in, and automatically mount it using that password. that’s how it should be. and so far, it works perfect.
currently, rsync is mirroring my home directory onto the disk. as soon as it is done, i will copy some stuff from the other backup disk over (like my server’s backups) which i don’t have on the laptop’s harddisk (which is 180 gb smaller than each of the backup disks), and after that, my old backup disk will be reformatted as well and also filled.
after that, i will deposit one of the backup drives somewhere outside my apartment: in case something goes wrong (like house burns down, someone decides to break in, …), i still have a backup somewhere. and, as it is encrypted, nobody but me can read it. (even if someone breaks in here, and steals both laptop and backup, they can’t access the data without my password. and yes, i am aware of xkcd.)

today i was again thinking on backup solutions. my current backup harddisks use ext2 resp. ext3, a linux file system. but now i’m forced to use osx, as my linux machine completely died. it turns out that osx can read ext2/ext3 using ext2fsx, but that one is pretty instable and already killed one ext2 partition of mine (note: don’t mount anything as writeable!). i began looking for a file system which both osx and linux can write and read (and which is known to be stable). but… so far, i found: nothing. well, except fat32. i mean, hey, what the heck?! it can’t be that fat32 is the only one supported by both of them; fat32 is a relict from the stone age of computing, nothing which you want to use on a modern computer. well, maybe, but only if you store your stuff in tar files, another pretty antique, but at least somehow useful thing. this makes it pretty much impossible to browse the backup data, to see what’s in it, to easily extract certain files, but at least provides a working alternative. but, well, is this really the only one?!
another reason why we’re still in the stone age of computing…