last summer, after buying a new four terabyte harddisk for my main computer (replacing the old and notoriously full one terabyte harddisk), i wanted to try something new. instead of using ext2/3/4, i decided to switch to the btrfs filesystem. the main feature why i wanted to use btrfs was the ability to quickly create snapshots of the current disk content on the fly, thus being able to browse through how the disk looked some time ago. the snapshots are essentially only the difference between the old data and the new, thus they are essentially free if the disk content isn’t changing a lot between the snapshots. which, at least for me, is usually the case.
i’m using btrfs only for the
/home partition, to which i added a subdirectory
/home/backup to store backups. in this post, i want to explain how to set up a simple system which makes a snapshot every ten minutes, and cleans up older snapshots so that
- for snapshots older than a day, only one snapshot is left for every hour, and
- for snapshots older than a week, only one snapshot is left for every day, and
- for snapshots older than a year, only one snapshot is left for every month.
so even with a lot of changes inbetween, the number of snapshots shouldn’t be too big, and thus not too much space will be wasted, while still allowing to access old (and deleted!) data. note that changing the interval from every ten to, say, every minute should be no problem. if you ever accidently delete something, you’ll have no problem to resurrect the file even if you only notice some hours, days, weeks or even months later. (providing that the file has already been around for at least a similar time interval.)
one note regarding btrfs in general. while btrfs is still marked experimental, it seems to be pretty stable in practice. the only caveat is that you should never fill btrfs disks too much. always make sure enough space is left. that shouldn’t be a problem for my four terabyte disk for quite some time, but in case you love to quickly fill space, better get more than one drive and join them (via raid zero or something like that). also, note that one btrfs filesystem can span over several partitions and disks, and that it can internally do several raid modes. in fact, that’s something i want to try out soon, by combining a bunch of older harddisks i’ve still lying around in a jbod array and putting a raid one btrfs filesystem over all of them. note that btrfs will in the future allow to configure this even more refined (like increasing redundancy, or also using different configurations per file), and that it’s always possible to update a filesystem on the fly while it is mounted.
creating read-only snapshots.
creating a read-only snapshot is simple: just run
btrfs subvolume snapshot -r /home /home/backup/name_of_snapshot. (if you want snapshots you can also write to, drop the
-r.) for example, you could create a little shell script:
1 #!/bin/bash 2 TIMESTAMP=`date +"%Y-%m-%d-%H%M%S"` 3 btrfs subvolume snapshot -r /home /home/backup/$TIMESTAMP 4 rm -rf /home/backup/$TIMESTAMP/backup/20*
this creates a read-only snapshot based on the current date, and cleans up the
/home/backupin the snapshot. after all, we don’t want to recursively increase the tree’s depth by having links to all older snapshots in each snapshot.
setting up your computer to execute this script regularly is quite simple. let’s say it is stored as
/home/backup/snapshot.sh with read and execution priviledges for
root; then you could run
crontab -e as
root and add a line like
1,11,21,31,41,51 * * * * root /bin/bash -c "/home/backup/snapshot.sh &>> /var/log/snapshot.log"
this runs the script at xx:01, xx:11, xx:21, xx:31, xx:41 and xx:51 for every hour xx on every day during the whole year. the script’s output (which should be essentially something like
Create a snapshot of '/home' in '/home/backup/2014-04-27-000100') is stored in a log file
cleaning up is a little more complicated. deleting a snapshot itself is easy: just run
btrfs subvolume delete /home/backup/name_of_snapshot. to delete snapshots according to the rules i wrote up above, i wrote a little python script:
1 #!/usr/bin/python2 2 import os, os.path, datetime, subprocess 3 4 class CannotParse(Exception): 5 pass 6 7 # Find all directories in /home/backup 8 now = datetime.datetime.now() 9 td_day = datetime.timedelta(days=1) 10 td_week = datetime.timedelta(weeks=1) 11 td_month = datetime.timedelta(days=31) 12 monthold = dict() 13 weekold = dict() 14 dayold = dict() 15 rest = dict() 16 for file in os.listdir('/home/backup'): 17 if not os.path.isfile(file): 18 # Interpret name as timestamp 19 data = file.split('-') 20 try: 21 if len(data) == 4: 22 year = int(data) 23 month = int(data) 24 day = int(data) 25 if len(data) == 4: 26 hour = int(data[0:2]) 27 minute = int(data[2:4]) 28 second = 0 29 elif len(data) == 6: 30 hour = int(data[0:2]) 31 minute = int(data[2:4]) 32 second = int(data[4:6]) 33 else: 34 raise CannotParse() 35 timestamp = datetime.datetime(year, month, day, hour, minute, second) 36 isodate = timestamp.isocalendar() + (hour, minute, second) 37 else: 38 raise CannotParse() 39 40 age = now - timestamp 41 if age >= td_month: 42 id = isodate[0:2] 43 d = monthold 44 elif age >= td_week: 45 id = isodate[0:3] 46 d = weekold 47 elif age >= td_day: 48 id = isodate[0:4] 49 d = dayold 50 else: 51 id = isodate[0:6] 52 d = rest 53 if id not in d: 54 d[id] = list() 55 d[id].append([timestamp, file]) 56 except Exception: 57 pass 58 59 def work(d, title): 60 for id in d: 61 list = d[id] 62 list.sort() 63 if len(list) > 1: 64 for v in list[1:]: 65 retcode = subprocess.call(['btrfs', 'subvolume', 'delete', '/home/backup/' + str(v)]) 66 if retcode != 0: 67 print 'Error! (Return code ' + str(retcode) + ')' 68 69 work(monthold, "MONTH OLD:") 70 work(weekold, "WEEK OLD:") 71 work(dayold, "DAY OLD:") 72 work(rest, "REST:")
i stored it as
/home/backup/cleanup.py and made it runnable by
root, and scheduled it to be run every hour at a fixed minute offset (say, xx:59) by running
crontab -e and adding
59 * * * * root /bin/bash -c "/home/backup/cleanup.py &>> /var/log/snapshot.log"
again, the output is put into