skip to main content.

posts about linux.

the aim of this post is to describe how to set up an encrypted arch linux installation on a headless server. while migrating to a new server during the last days, i had to go through the procedure another time. since it is easy to screw something up and you don't get helpful error messages without a serial console or (virtual) kvm, i wanted to share my instructions on how to set up such a machine. my previous server, hosted at strato, had a serial console via ssh included, so it wasn't that challenging to set it up. for my new server, hosted at hosttech, no serial console is available, but you can get a kvm attached. i had my kvm day yesterday as it makes life much easier (handling grub menus, or see what went wrong when networking doesn't work), and set up the machine twice to see whether i could also do it without a kvm. the instructions here now work without a serial console or kvm, though ymmv: tiny differences in systems, rescue boots etc. can send you into a situation where something doesn't work and you don't know what. so be warned, and try it out with a vm first to be on the safe side. doing this whole thing with another distribution is certainly also possible, but will in many details be substantially different from what i describe here. these instructions also contain some hardening not necessarily for all situations.

this post assumes you have a certain level of linux experience. i assume that you have a headless server sitting somewhere which has a software raid-1 disk configuration and you have a rescue system available which boots over the network. all dedicated server hosters i know provide something like that, you can usually set a flag in the customer/setup area of your hoster to start such a system on the next boot. hosttech uses riplinux for their rescue system, so some of the details i describe below might be specific to this one and not work with other such systems.

your server will end up in a state where you have to unlock the encrypted disk remotely via ssh, so as long as your server isn't compromised (which can happen if it is hosted at a place you don't control), you can unlock it after reboots without entering your password in a kvm/serial console (which might be tapped into). this also means you must unlock it after every reboot; it won't come back up alone by itself. (otherwise the encryption would be moot.) so don't put anything on the server which is too critical to be leaked. (you might not want to put it on a computer in the first place, though.) despite this disadvantage, one big advantage is protection of your data: if a faulty disk of your server is replaced, or your server is decommissioned, your data cannot be extracted from the disk without knowing your encryption key. and if you can wipe the luks header several times, even having your keys won't bring the data back (except if you have a backup of the header and the person having access to your key also has access to that backup).

the whole setup is split up into two parts:
  1. setting up a small unencrypted installation of arch linux on the server;

  2. using that unencrypted installation to set up a proper encrypted arch linux server.

i've chosen this approach for the strato server back then since strato's rescue system didn't offer cryptsetup/luks back then. this approach also has less requirements on the rescue system, and you have a clean arch linux install to set up the real system. and you can use the unencrypted installation as your own personal rescue system to do maintenance on the encrypted installation, and be sure that all necessary tools are either already installed, or can easily be added the same way as you usually install packages on your real server. (rescue systems don't have to offer a package manager, so installing something you need but which isn't there can be really annoying.)

one simple note before we begin: if you need to create a password or random text string, you can use use dd status=none if=/dev/random of=/dev/stdout bs=1 count=15 | base64 to generate them.

also note that the arch linux wiki has a collection of useful installation guides, which cover a lot of different cases. here, i'm mostly following the steps in install from existing linux, as well as instructions from remote unlocking of the root (or other) partition. the wiki also contains a huge amount of other useful information, like howtos on setting up encrypted systems in many different variants.

one final note: you might be tempted to also try to encrypt the boot partition; while this is possible nowadays, you cannot use it for your server, as for remote unlocking you need the init ramdisk up and running, whose contents are stored on the boot partition. this will change if at some point, grub will include a possibility for remote unlocking. (if that ever happens.) (what you could also do is create a mini boot partition which allows remote unlocking the real boot partition, and then boots the system installed on the real boot partition. that doesn't really improve security by much, though.)

as all such instructions, this post comes without any warranty. you're on your own! if you have data on the server, back it up first! these instructions will delete everything on your server, and might put it into a state where it must be reset by your hoster, which might cost you money. also, if your server is currently a production machine, be sure that it is no longer actively used and all data is backed up before you start playing. if something goes wrong, don't blame me.

setting up the unencrypted arch linux installation

first boot your server into the rescue system, and begin setting up partitions. you need (at least) three partitions:
  • a partition for the unencrypted install (2 gb);

  • a boot partition for your encrypted install (2 gb);

  • a partition for the encrypted partitions (rest).

in case you use a gpt partition table, you need a bios boot partition. if you're using uefi, you'll have to ask someone else (and probably adjust some more things in my instructions, so try it out in a vm or with a serial console/kvm first!).

next, create raid arrays for partitions 2—4 (i'm assuming 1 is a bios boot partition; if not, you have to renumber the devices below):

mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda2 /dev/sdb2
mdadm --create --verbose /dev/md1 --level=mirror --raid-devices=2 /dev/sda3 /dev/sdb3
mdadm --create --verbose /dev/md2 --level=mirror --raid-devices=2 /dev/sda4 /dev/sdb4

the next step is to create an ext4 filesystem on /dev/md0 which will serve as the root filesystem of the unencrypted system:

mkfs.ext4 /dev/md0
mount /dev/md0 /mnt

/dev/md1 will later host the boot partition of the encrypted system, and /dev/md2 will store the encrypted root, home and swap partitions (or whatever more you want to create). it is good practice to wipe the encrypted partition, either before creating the encrypted system (by filling it with random data) or afterwards (by filling the encrypted partition with zeros). to wipe the partition before encrypting it, you can run:

openssl enc -aes-256-ctr -pass \
    pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" \
    -nosalt < /dev/zero > /dev/md2

note that the hosttech riplinux rescue system has no base64; you can instead run dd if=/dev/random bs=128 count=1 2>/dev/null | base64 on your desktop computer and put the result into the double quotes above. this step is rather slow, so you'll better do it in a screen session. on my server, it took roughly 1.5 hours for a 500 gb partition. in fact, starting a screen session is a good idea anyway, as you don't want connection failures to interrupt (and potentially destroy) your installation procedure.

to set up arch linux on /dev/md0, i followed the instructions here with some modifications; most of them were because the rescue system didn't support certain features. here are the details of what i did:

cd /tmp
wget http://mirrors.kernel.org/archlinux/iso/2016.06.01/archlinux-bootstrap-2016.06.01-x86_64.tar.gz
sha512sum /tmp/archlinux-bootstrap-2016.06.01-x86_64.tar.gz

(the original instructions use curl -O instead of wget, but riplinux only provides the latter. also, the original url is https://, but the provided wget couldn't connect to.)

i also downloaded

https://mirrors.kernel.org/archlinux/iso/2016.06.01/archlinux-bootstrap-2016.06.01-x86_64.tar.gz

and

https://mirrors.kernel.org/archlinux/iso/2016.06.01/archlinux-bootstrap-2016.06.01-x86_64.tar.gz.sig

on my desktop machine, computed sha512sum of archlinux-bootstrap-2016.06.01-x86_64.tar.gz and compared it to the one on the server, and finally used gpg (gnu privacy guard) to verify the signature (see this document for details on signature verification). if the sha512 checksums match and the signature validates, everything's ready to go! (you might have to use sha256sum or even md5sum, depending on what the rescue system you're using offers. if your rescue system offers gpg, you can also validate the signature on the server itself without downloading the file a second time.)

next, continue with:

tar xzf archlinux-bootstrap-2016.06.01-x86_64.tar.gz

now you're supposed to run /tmp/root.x86_64/bin/arch-chroot /tmp/root.x86_64/ according to the instructions, but that didn't work on any of the rescue systems i tried. instead, the manual method works:

mount --bind /tmp/root.x86_64 /tmp/root.x86_64
cd /tmp/root.x86_64
cp /etc/resolv.conf etc
mount -t proc /proc proc
mount --rbind /sys sys
mount --rbind /dev dev
#mount --rbind /run run
chroot /tmp/root.x86_64 /bin/bash

i skipped mounting /run as it wasn't provided on the rescue system. everything works fine without it. the next step is to set up pacman, the arch linux package manager. the suggested step for this is pacman-key --init which generates a gpg key using random data from /dev/random. unfortunately, on a headless server, this takes a long time. if you can, you can speed this up using haveged if your rescue system provides it, or you generate the necessary files on another system. to do this on my local machine, i downloaded the above bootstrap archive (archlinux-bootstrap-2016.06.01-x86_64.tar.gz), extracted it, chrooted into it, and ran pacman-key --init there. (it was done after a few seconds, as opposed to the 8 hours i tried on the headless server first, after which i killed it.) go into root.x86_64/etc/pacman.d and do tar cf pacman.tar gnupg, and transfer pacman.tar onto the rescue system. on the rescue system, go to /tmp/root.x86_64/etc/pacman.d/ (outside the chroot, as the chroot provides no tar!) and extract the tarball there, so that you now have a non-empty subdirectory called gnupg.

then go back into the chroot and continue with:

pacman-key --populate archlinux

next, leave the chroot and edit the mirrorlist at /tmp/root.x86_64/etc/pacman.d/mirrorlist (the chroot provides neither vi nor nano, but the rescue system does). uncomment whatever mirror you find useful and go back into the chroot environment. make sure that some http:// mirrors are uncommented as well if you had problems with downloading the https:// bootstrap archive above. then set up the basic system inside the chroot with:

pacman -Syy
pacman -S base base-devel parted

the next steps in the official howto is to continue with pacstrap and later arch-chroot. that didn't work for me; both scripts complain about devtmpfs not being available, and arch-chroot also complained about the invalid argument --pid of unshare. i patched the scripts with

nano `which pacstrap`
nano `which arch-chroot`

by searching for devtmpfs twice (the first ocurrence is at the beginning); the second match should be at these two lines:

chroot_add_mount udev "$1/dev" -t devtmpfs -o mode=0755,nosuid &&
chroot_add_mount devpts "$1/dev/pts" -t devpts -o mode=0620,gid=5,nosuid,noexec &&

i changed these to:

chroot_add_mount -o bind /dev "$1/dev" &&
chroot_add_mount -o bind /dev/pts "$1/dev/pts" &&

note that this will screw up the unmount mechanism in these scripts. that isn't nice, but it'll work without. (and as soon as you have the unencrypted system set up, you can use it to install the encrypted system, and since the unencrypted system is a full arch linux system, you won't have such problems again. that's another reason why i like to set up an unencrypted system as well.) in arch-chroot, i also had to change

SHELL=/bin/sh unshare --fork --pid chroot "$chrootdir" "$@"

to

SHELL=/bin/sh unshare --fork chroot "$chrootdir" "$@"

i.e. remove the --pid argument. finally, do

mkdir /run/shm

in case your rescue system doesn't have /run/shm (like mine did). then you can proceed with installing arch linux. first, mount the partition you want to install the unencrypted system on as /mnt:

mount /dev/md0 /mnt

then you can set up the base system:

pacstrap /mnt base
genfstab -U -p /mnt >> /mnt/etc/fstab

note that on my system, this didn't use uuids for identifying the disks, which is in general a good idea. to find out the uuids for the devices, run blkid and change /mnt/etc/fstab by replacing entries such as /dev/md0 with UUID=xxxxxxxxxxxx.

after that, continue with:

arch-chroot /mnt
echo unencrypted-rescue-system > /etc/hostname
ln -s /usr/share/zoneinfo/Europe/Zurich /etc/localtime
echo en_US.UTF-8 UTF-8 > /etc/locale.gen
locale-gen
echo LANG=en_US.UTF-8 > /etc/locale.conf

obviously, you should replace unencrypted-rescue-system and Europe/Zurich and possibly also en_US with something more fitting. next, run

passwd

to set a root password. generate a random one and write it down in a safe place. (you can also later log in with ssh and change it, if you fear the rescue system is too nosy.)

next, you have to configure your networking. first, you have to find your systemd network device name. they are usually of the form enpXsY (assuming you don't use wlan for your server); to find the right name (your rescue system might use old ethX names), run lspci and look for Ethernet controller. if you find something like

XX:YY.x Ethernet controller

you can extract XX and YY for enpXXsYY right away. two caveats though: first, you need to strip leading zeros, and second, the numbers given by lspci are in hexadecimal notation, while the ones in enpXsY must be in decimal, so you'll have to convert them.

as soon as you found out the name of your network interface, create /etc/netctl/wired with the following content:

Description='main ethernet connection'
Interface=enpXsY  # REPLACE THIS!
Connection=ethernet

IP=static
Address=('xxx.xxx.xxx.xxx/yy')
Gateway='xxx.xxx.xxx.xxx'
DNS=('xxx.xxx.xxx.xxx' 'xxx.xxx.xxx.xxx')

IP6=static
Address6=('2001:xxxx:xxxx:xxxx::1/64')
Gateway6=('fe80::1')

you need to adjust the interface name, the ipv4 and ipv6 addresses and the network masks and dns servers correctly, obviously. you can also use dhcp if your hoster supports that. next, continue with:

netctl enable wired
pacman -S openssh grub lvm2
systemctl enable sshd.service

then you have to edit /etc/mkinitcpio.conf and insert mdadm_udev in the HOOKS = "..." line somewhere before filesystems. (otherwise, the system won't come up again as it won't be able to assemble the raid arrays.) next, edit /etc/ssh/sshd_config and add

PermitRootLogin yes

at its end. (otherwise you won't be able to login to the system at all, as root is the only user.)

then run:

mdadm -E --scan >> /etc/mdadm.conf
mkinitcpio -p linux
grub-install --target=i386-pc /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg
sync

finally, unmount and reboot:

exit
cd /
umount -R /mnt
exit
reboot

the unencrypted "rescue" system should be ready to go.

setting up the encrypted arch linux installation

log into the newly set up unencrypted system via ssh root@your-server and your root password you set above. (now is the time to change it if you don't trust the rescue system too much.)

now contine by installing two important packages and creating a temporary filesystem:

pacman -yS cryptsetup screen
mount -t ramfs -o size=1M none /mnt

start a screen session and continue in there. we'll need the temporary filesystem to transfer the master key for the encrypted partition without writing it to disk. instead of creating the master key on the headless server (which doesn't have enough entropy, probably), create it in your desktop computer:

dd if=/dev/random of=server-masterkey bs=1024 count=1
scp server-masterkey root@your-server:/mnt

back on your server, inside the screen session, create the encrypted disk:

cryptsetup --verbose --cipher aes-xts-plain64 --key-size 512 -h sha256 -i=10000 \
    --verify-passphrase --master-key-file /mnt/server-masterkey luksFormat /dev/md2

you have to enter a passphrase for your encrypted partition. use a longer randomly generated password and store it safely, or something which is long enough and you can remember. the setting -i=10000 i used is rather paranoid: hashing the password takes roughly 10 seconds on your server. this makes most brute-force attacks impossible, but also makes unlocking (and all other cryptsetup operations) slow. feel free to decrease the number, but since these operations need to be done only very seldom (like now while installing, and once after each reboot of your server) there's no need to do that.

then get rid of the master key temp filesystem and open the encrypted partition:

umount /mnt
cryptsetup luksOpen /dev/md2 cryptdisk

if you didn't wipe the space occupied by the encrypted partition earlier with random data, you can now run dd if=/dev/zero of=/dev/mapper/cryptdisk bs=1M. (do that inside a screen session, as it will take a lot of time!)

create a lvm on the encrypted partition:

pvcreate /dev/mapper/cryptdisk
pvdisplay

vgcreate server /dev/mapper/cryptdisk
vgdisplay

lvcreate --size 32G --name root server
lvcreate --contiguous y --size 4G --name swap server
lvcreate --extents +100%FREE --name home server
lvdisplay
here, i'm creating:
  • a root volume with 32 gb,

  • a swap volume with 4 gb,

  • a home volume occupying the remaining space.

adjust the sizes to your needs. next, create filesystems and mount everything:

mkfs.ext4 /dev/md1   # the boot partition
mkfs.ext4 /dev/mapper/server-root
mkfs.ext4 /dev/mapper/server-home
mkswap /dev/mapper/server-swap

swapon /dev/mapper/server-swap
mount /dev/mapper/server-root /mnt
mkdir /mnt/boot /mnt/home
mount /dev/mapper/server-home /mnt/home
mount /dev/md1 /mnt/boot

you can now install arch linux:

pacman -S arch-install-scripts
pacstrap /mnt base
genfstab -U -p /mnt >> /mnt/etc/fstab

check /mnt/etc/fstab. it should have uuids this time.

continue with:

arch-chroot /mnt
echo your-server > /etc/hostname
ln -s /usr/share/zoneinfo/Europe/Zurich /etc/localtime
echo en_US.UTF-8 UTF-8 > /etc/locale.gen
locale-gen
pacman -Sy grub openssh screen cryptsetup sudo busybox base-devel
pacman -Sy wget dropbear mkinitcpio-nfs-utils
modprobe dm-mod
mdadm -E --scan >> /etc/mdadm.conf

obviously, replace your-server, Europe/Zurich and en_US with something more fitting. now check the bottom of /etc/mdadm.conf. does it contain all three raid arrays (or how many you created)?

next, run

cryptsetup luksAddKey /dev/md2

and add a key with a long random string as a password. i'll refer to this key as LONG_PASSWORD from now on. later, you can use this key to remotely unlock the disk on boot time. the new password should end up in slot 1. you can check the slots with:

cryptsetup luksDump /dev/md2

next, create the network configuration /etc/netctl/wired with the same content as in the unencrypted system. then edit /root/.ssh/authorized_keys (you might have to create /root/.ssh first) and paste in some public ssh keys you'll want to use for login later. (we'll disable root login with password below, so you really need to do this!)

then continue with:

netctl enable wired
ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
ssh-keygen -t rsa -b 4096 -f /etc/ssh/ssh_host_rsa_key

next edit /etc/ssh/sshd_config and add/change:

Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key

PermitEmptyPasswords no
PermitRootLogin without-password
StrictModes yes
AllowUsers root
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes256-ctr,aes192-ctr
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512,hmac-sha2-256
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

(add other user names to AllowUsers which you will create later and want to use to remotely login via ssh. if you're using a less-modern openssh ssh client, or some non-openssh client, you might want to tweak the mentioned ciphers, macs and key exchange algorithms because you won't be able to connect to your server otherwise.)

remove all other host keys mentioned (or comment them out if they aren't). next, edit /etc/ssh/moduli and remove all lines with less than 4096 bits (the fifth column contains the bitlength). these last two steps (editing files in /etc/ssh) are not required, but do harden your system. next, run

systemctl enable sshd.service

so you can actually ssh your new system after reboot. we now want to set up remote unlocking. (also see this document if you want to know more.) first, you should generate an rsa key for ssh communication. don't create an ecc key here, as dropbear (which we'll use) doesn't support them (you can also use tinyssh, but that uses a different key format than openssh, so you'll have to do some more work). on your desktop machine, run:

ssh-keygen -b 4096 -t rsa -f ~/.ssh/id_rsa_server_unlocking
scp ~/.ssh/id_rsa_server_unlocking.pub root@your-server:/mnt/root/

store the private key somewhere safe; you'll need it (together with the password LONG_PASSWORD) to remotely unlock your server. now, run the following on the server:

mkdir -p /build
chgrp nobody /build
chmod g+w /build
cd /build
for i in mkinitcpio-netconf mkinitcpio-dropbear mkinitcpio-utils; do
    wget https://aur.archlinux.org/cgit/aur.git/snapshot/$i.tar.gz
    tar -xvzf $i.tar.gz
    chown -R nobody:nobody $i
    cd $i
    sudo -u nobody makepkg
    chown root:root $i-*.xz
    mv $i-*.xz ..
    cd ..
    rm -rf $i
done
mv *.xz /root
cd /root
rm -rf /build
cat /root/id_rsa_server_unlocking.pub > /etc/dropbear/root_key
for i in mkinitcpio-netconf mkinitcpio-dropbear mkinitcpio-utils; do
    pacman -U $i-*.tar.xz
done
make sure everything builds and installs fine. then edit /etc/mkinitcpio.conf:
  1. change the MODULES="" line to MODULES="dm_mod dm_crypt aes_x86_64 raid1";

  2. insert lvm2 mdadm_udev netconf dropbear encryptssh in the HOOKS="..." string before filesystems, and add shutdown at the end. the line should now look like HOOKS="base udev autodetect modconf block lvm2 mdadm_udev netconf dropbear encryptssh filesystems keyboard fsck shutdown".

next, modify /usr/lib/initcpio/hooks/dropbear so that the lines starting the server look like:

echo "Starting dropbear (on port 12345)"
/usr/sbin/dropbear -E -s -j -k -p 12345

i.e. add "-p 12345" to the dropbear call and printed text. this will be the port you have to connect with ssh to to remotely unlock. you can also skip this, then you'll have to use the standard ssh port (22).

continue with editing /etc/default/grub. modify the GRUB_CMDLINE_LINUX variable to

GRUB_CMDLINE_LINUX="cryptdevice=/dev/md2:server ip=:::::eth0:dhcp"

or, to be on the safe side, to

GRUB_CMDLINE_LINUX="cryptdevice=/dev/md2:server ip=xxx.xxx.xxx.xxx::yyy.yyy.yyy.yyy:zzz.zzz.zzz.zzz:your-server:eth0:none"

(i had trouble with the first variant some years ago). replace xxx.xxx.xxx.xxx with your server's ip, yyy.yyy.yyy.yyy with the gateway, zzz.zzz.zzz.zzz with the hostmask and your-server with your server's hostname. you also might have to adjust eth0 in case your server has more than one network interface. (for me, eth0 always worked.)

next, set a root password, create the init ramdisk, and set up the boot loader:

passwd
mkinitcpio -p linux
grub-install --recheck /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg

in case you're using a old mbr partition table, you might have to set the bootable flag for the boot partition.

then, exit the chroot and reboot:

exit
umount -R /mnt
swapoff /dev/mapper/server-swap
cryptsetup luksClose cryptdisk
sync
reboot

unlocking the encrypted arch linux installation

your server should now boot into the init ramdisk, start dropbear, and wait for a connection to unlock your encrypted partition. to unlock it, run:

echo LONG_PASSWORD | ssh -p 12345 -i ~/.ssh/id_rsa_server_unlocking root@your-server

this should unlock your encrypted disk (which takes around 10 seconds if you followed my steps to the letter), and then boot arch linux. you'll be able to log in as root with the ssh keys you inserted earlier. form that point on, you can configure the system like any random linux installation via ssh (for example, by using ansible).

if the system doesn't come up or doesn't start networking (you can use ping to see whether the network interface is up; as soon as it responds to ping after reboot, you can try the above ssh unlocking command), you can either reboot into your hoster's rescue system, mount and chroot the unencrypted system, and rewrite the boot loader to reboot in the unencrypted system, and/or use a serial console and/or kvm to find out what went wrong. anyway, debugging such a situation is really hard, so good luck! but if your system is close enough to mine and you followed the above steps correctly (and i didn't screwed something up), it should work.

last summer, after buying a new four terabyte harddisk for my main computer (replacing the old and notoriously full one terabyte harddisk), i wanted to try something new. instead of using ext2/3/4, i decided to switch to the btrfs filesystem. the main feature why i wanted to use btrfs was the ability to quickly create snapshots of the current disk content on the fly, thus being able to browse through how the disk looked some time ago. the snapshots are essentially only the difference between the old data and the new, thus they are essentially free if the disk content isn’t changing a lot between the snapshots. which, at least for me, is usually the case.
i’m using btrfs only for the /home partition, to which i added a subdirectory /home/backup to store backups. in this post, i want to explain how to set up a simple system which makes a snapshot every ten minutes, and cleans up older snapshots so that

  • for snapshots older than a day, only one snapshot is left for every hour, and
  • for snapshots older than a week, only one snapshot is left for every day, and
  • for snapshots older than a year, only one snapshot is left for every month.

so even with a lot of changes inbetween, the number of snapshots shouldn’t be too big, and thus not too much space will be wasted, while still allowing to access old (and deleted!) data. note that changing the interval from every ten to, say, every minute should be no problem. if you ever accidently delete something, you’ll have no problem to resurrect the file even if you only notice some hours, days, weeks or even months later. (providing that the file has already been around for at least a similar time interval.)

one note regarding btrfs in general. while btrfs is still marked experimental, it seems to be pretty stable in practice. the only caveat is that you should never fill btrfs disks too much. always make sure enough space is left. that shouldn’t be a problem for my four terabyte disk for quite some time, but in case you love to quickly fill space, better get more than one drive and join them (via raid zero or something like that). also, note that one btrfs filesystem can span over several partitions and disks, and that it can internally do several raid modes. in fact, that’s something i want to try out soon, by combining a bunch of older harddisks i’ve still lying around in a jbod array and putting a raid one btrfs filesystem over all of them. note that btrfs will in the future allow to configure this even more refined (like increasing redundancy, or also using different configurations per file), and that it’s always possible to update a filesystem on the fly while it is mounted.

creating read-only snapshots.

creating a read-only snapshot is simple: just run btrfs subvolume snapshot -r /home /home/backup/name_of_snapshot. (if you want snapshots you can also write to, drop the -r.) for example, you could create a little shell script:

1#!/bin/bash
2TIMESTAMP=`date +"%Y-%m-%d-%H%M%S"`
3btrfs subvolume snapshot -r /home /home/backup/$TIMESTAMP
4rm -rf /home/backup/$TIMESTAMP/backup/20*

this creates a read-only snapshot based on the current date, and cleans up the /backup subdirectory of /home/backup in the snapshot. after all, we don’t want to recursively increase the tree’s depth by having links to all older snapshots in each snapshot.

setting up your computer to execute this script regularly is quite simple. let’s say it is stored as /home/backup/snapshot.sh with read and execution priviledges for root; then you could run crontab -e as root and add a line like
1,11,21,31,41,51 * * * * root /bin/bash -c "/home/backup/snapshot.sh &>> /var/log/snapshot.log"
this runs the script at xx:01, xx:11, xx:21, xx:31, xx:41 and xx:51 for every hour xx on every day during the whole year. the script’s output (which should be essentially something like Create a snapshot of '/home' in '/home/backup/2014-04-27-000100') is stored in a log file /var/log/snapshot.log.

cleaning up.

cleaning up is a little more complicated. deleting a snapshot itself is easy: just run btrfs subvolume delete /home/backup/name_of_snapshot. to delete snapshots according to the rules i wrote up above, i wrote a little python script:

 1#!/usr/bin/python2
 2import os, os.path, datetime, subprocess
 3
 4class CannotParse(Exception):
 5    pass
 6
 7# Find all directories in /home/backup
 8now = datetime.datetime.now()
 9td_day = datetime.timedelta(days=1)
10td_week = datetime.timedelta(weeks=1)
11td_month = datetime.timedelta(days=31)
12monthold = dict()
13weekold = dict()
14dayold = dict()
15rest = dict()
16for file in os.listdir('/home/backup'):
17    if not os.path.isfile(file):
18        # Interpret name as timestamp
19        data = file.split('-')
20        try:
21            if len(data) == 4:
22                year = int(data[0])
23                month = int(data[1])
24                day = int(data[2])
25                if len(data[3]) == 4:
26                    hour = int(data[3][0:2])
27                    minute = int(data[3][2:4])
28                    second = 0
29                elif len(data[3]) == 6:
30                    hour = int(data[3][0:2])
31                    minute = int(data[3][2:4])
32                    second = int(data[3][4:6])
33                else:
34                    raise CannotParse()
35                timestamp = datetime.datetime(year, month, day, hour, minute, second)
36                isodate = timestamp.isocalendar() + (hour, minute, second)
37            else:
38                raise CannotParse()
39            
40            age = now - timestamp
41            if age >= td_month:
42                id = isodate[0:2]
43                d = monthold
44            elif age >= td_week:
45                id = isodate[0:3]
46                d = weekold
47            elif age >= td_day:
48                id = isodate[0:4]
49                d = dayold
50            else:
51                id = isodate[0:6]
52                d = rest
53            if id not in d:
54                d[id] = list()
55            d[id].append([timestamp, file])
56        except Exception:
57            pass
58
59def work(d, title):
60    for id in d:
61        list = d[id]
62        list.sort()
63        if len(list) > 1:
64            for v in list[1:]:
65                retcode = subprocess.call(['btrfs', 'subvolume', 'delete', '/home/backup/' + str(v[1])])
66                if retcode != 0:
67                    print 'Error! (Return code ' + str(retcode) + ')'
68
69work(monthold, "MONTH OLD:")
70work(weekold, "WEEK OLD:")
71work(dayold, "DAY OLD:")
72work(rest, "REST:")

i stored it as /home/backup/cleanup.py and made it runnable by root, and scheduled it to be run every hour at a fixed minute offset (say, xx:59) by running crontab -e and adding
59 * * * * root /bin/bash -c "/home/backup/cleanup.py &>> /var/log/snapshot.log"
again, the output is put into /var/log/snapshot.log.

posted in: computer
tags:
places:

recently, there was a pratical attack against linux encrypted hard disks (with linux unified key setup (luks)) which use some form of cipher block chaining (cbc) mode. since this mode was the default some time ago, and i did use it in the past, and also had some other not-so-great setups especially on older backup disks (like using sha1 instead of sha2), i thought about re-encrypting some of my backup disks.
the usual re-encrypt step with luks goes as follows: get another harddisk with enough space, copy the data over there, re-create the luks encrypted disk, and copy the data back. takes a lot of time and space. in theory, it should be no problem of re-encrypting a disk on the fly (in theory, even while using it), but i didn’t like the idea of investing quite some time to implement that. fortunately, after searching around the web a bit, i found out that someone already did it. (offline only, though, but that’s completely fine for backup disks.) in fact, it is part of the official cryptsetup distribution form version 1.5.0 on. unfortunately, the cryptsetup coming with my installed linux mints is 1.4.3, but then, since re-encryption does not require kernel support, i could just compile the new cryptsetup and use the compiled version’s cryptsetup-reencrypt command line tool. well, and so i did.
since the process can be a bit more complicated, i wanted to document it, and since i didn’t find too much on cryptsetup-reencrypt on the net, i thought writing a blog post about it is a good idea :-)

simple resizing.

warning: if you experience power loss or your machine crashes during re-encryption, the whole encrypted harddisk might be unusable! so only do this for data you already have backuped. in case you want to re-encrypt your only backup disk, take this as an opportunity to buy a second backup disk, encrypt it with the correct settings, copy the data there as well, and only then re-encrypt your old backup harddisk.

well, you have been warned. so now let’s get it done.

so let us assume we want to re-encrypt partition /dev/sdx1. namely, we want to use aes in xex-based tweaked-codebook mode with ciphertext stealing (xts) with a key size of 512 bits (that is, we use aes-256) with hash function sha-2 with 256 bits. moreover, the hash function should be iterated often enough to obtain a time of around 2 seconds. for this, we can simply run
cryptsetup-reencrypt -c aes-xts-plain64 -s 512 -h sha256 -i=2000 /dev/sdx1
to improve performance, we can set a higher buffer size (like 32 megabytes with -B 32) or use direct i/o (with –use-directio). the command line will then be:
cryptsetup-reencrypt -c aes-xts-plain64 -s 512 -h sha256 -i=2000 -B 32 --use-directio /dev/sdx1
for some setups, this will start running, take ages, and eventually finish with a happily re-encrypted harddisk. (note that you need to enter all keys, not just one – if you don’t know one, remove it before calling!) to check out the result, run:
cryptsetup luksDump /dev/sdx1
everything is fine when you see the following in the output:

1...
2Cipher name:      aes
3Cipher mode:      xts-plain64
4Hash spec:        sha256
5...
6MK bits:          512
7...

finally, note that cryptsetup-reencrypt does support suspension and continuing – but only if it knows where exactly it stopped. so this won’t work with crashes or power losses, but if you press control+c, it should be no problem to continue later. (i haven’t tested it, though, i’m just repeating from its manual. note that you shouldn’t get rid of the temporary files if you want to be able to continue. if you delete them, you’ve screwed up and your data is very probably gone.)

resizing necessary.

warning: if you screw this up, you can destroy your filesystem.

the whole task gets slightly more complicated when cryptsetup-reencrypt tells you something along the lines of
Data offset for detached LUKS header must be either 0 or higher than header size (4036 sectors).
that simply means that there isn’t enough space for the (larger) luks header; so you need to make some space.
assuming you encrypted an ext2/3/4 filesystem, you can work around this as follows, assuming that you still have enough free space left on that device: first unlock the luks partition, then resize the ext2/3/4 filesystem, and finally re-encrypt with moving the data backwards (cryptsetup-reencrypt has an option specifically for this). first, run e2fsck on the filesystem (we assume it is unlocked to /dev/mapper/yourfilesystem)
e2fsck -D -p -v -t /dev/mapper/yourfilesystem
(add -f to force a complete check, and remove -p if you don’t want automatic repair in case of errors.)
if that ran through, observe the output:
disklabel: clean, 12345/123456789 files, 98765/987654321 blocks
since blocks have size 4 kilobytes (verify this by running dumpe2fs -h /dev/mapper/yourfilesystem and look for “block size”!), we can shrink the disk size by 512 blocks to obtain 2 megabytes of free space at the end of the encrypted partition – that should be enough space for the larger luks headers. to compute the new size, we take the maximal block number (987654321) and subtract 512 from it: in this case, we obtain 987653809. so now we can resize the ext2/3/4 partition to these many blocks:
resize2fs -p /dev/mapper/yourfilesystem 987653809
this also takes some time (though much less than cryptsetup-reencrypt). when it is done, we can run e2fsck another time if we want, and then lock the container:
cryptsetup luksClose yourfilesystem
then we can start re-encryption with data movement. for this, we add –reduce-device-size 2M, which moves the partition data by 2 megabytes to the back (note that cryptsetup-reencrypt uses base 1024, so 2 megabytes equals 2097152 bytes and not 2000000 bytes as in si units) – throwing away the last 2 megabytes of the encrypted filesystem. (since we resized our file system, we don’t use these anymore, so everything should be fine.) so we run:
cryptsetup-reencrypt -c aes-xts-plain64 -s 512 -h sha256 -i=2000 --reduce-device-size 2M -B 32 --use-directio /dev/sdx1
this runs for ages, and after it’s done, you can check the parameters with cryptsetup luksDump /dev/sdx1. afterwards, it isn’t a too bad idea to unlock the filesystem and run e2fsck again.

more complex settings.

everything get’s more complicated if you didn’t just encrypt a single filesystem, but say for example a whole set of partitions (using lvm, linux’ logical volume manager). in that case, you have to shrink its “physical volume”. fortunately, lvm has some support for shrinking and extending logical and physical volumes (see pvresize, lvresize and friends). if/when i’m actually trying this out, i might add some more details here (or in a new post) on how to do it.

posted in: computer
tags:
places:

after recently installing arch linux on my laptop (a thinkpad x230), i was first quite happy. but after some time, i noticed some flaws. first of all, having to do so many things by hand is somewhat annoying. if it would be just about installing software: no big deal (for me). but it is also about configuring stuff, like deciding between networkmanager and the arch-specific command line wireless setup, which is installed by default. switching to networkmanager was quite annoying, and in the end didn’t work very well (one anecdote: at some point, i had to reboot to get plain eth0 working again – reconfiguring by hand might have worked, but you don’t always have time to do that). and also power management was not so good, after trying some things i finally had a system which, coming back from suspend, waited a few seconds (usually enough to enter my password and unlock the computer) and sent it back to suspend. after the next unsuspend, there was no password protection left…
the final kick came when i tried to install hugin: it simply didn’t work. at all. pacman always gave up without an understandable error message. great, eh? at that point i decided to try linux mint another time.
last weekend, i first tried to install linux mint debian edition (lmde) on my laptop. it has the advantage of being a rolling release distribution. well, the installer doesn’t support harddisk encryption, but it allows you to do that by yourself. after having managed that with arch linux, i tried it. basically, at two points during the installation process, the installer lets you do some stuff – set up and mount partitions in the first stop, and installing packages/modules and setting up stuff for the first boot in the second stop – and waits for you to press the “forward” button. unfortunately, during the second stop, the “forward” button was grayed out. i hoped that maybe the installer enables it when time comes, but after doing everything (hopefully) and waiting, nothing happened. great, eh? well, i searched around the net, but found nothing. the only thing i found was a blog entry announcing lmde 201303 (which i was trying to install) with the note “please use this blog to report bugs”, which is nice, but not when you notice that comments are disabled. at that point, i gave up and downloaded a linux mint 15 image instead…
installing that one went quite smoothly. of course, again, the installer didn’t support using my encrypted setup (seems to be implemented nowhere, except in the old ubuntu alternate installer which is discontinued. yay, the good old times when stuff just worked out of the box!). mounting stuff before starting the installer (i also had to install the lvm2 package), the install went well, before rebooting, though, i had to do some new tricks. after trying around unsuccessfully for some time, i finally found a question on askubuntu.com, whose accepted answer provided the solution for me: it explains how to set up /etc/crpyttab, initramfs and grub to ask for a password on boot-up and unlock the encrypted disks (see also below in this post). with these steps, i was able to boot the newly installed linux mint 15, and from that point on, everything went well.
most stuff worked out of the box, and all packages i wanted to install actually existed (arch linux doesn’t have mmv by default, for example), and both wine and hugin did work out of the box. the only very annoying part was that linux mint screwed up my firefox profile. it created a new profile and changed the .mozilla/firefox/profiles.ini to only use the new profile. after modifying that file, i had my old profile back. after that, i was happy, and after a couple of days with wlan/vpn field test (i never even got so far to try vpn on arch linux), i’m opting to keep linux mint 15 for some while. i guess i’ll also install it on my desktop (replacing ubuntu 12.04 lts).
(actually, for desktop machines, arch linux will function much better, since there you don’t need fancy stuff like wireless setup, power saving etc. nonetheless, after the experience i had i won’t try it again for some time…)

quick conclusion: how to set up luks/lvm encryption manually on ubuntu/mint.

before i forget how this was done, or maybe askubuntu gets rid of the question and answer, i’ll document the necessary steps i had to do here (all paths are relative to the installed system’s root):

  1. create /etc/crypttab with a line like this:
    sda2_crypt UUID=... none luks
    to find out the correct uuid, try ls -la /dev/disk/by-uuid/. then you can see which uuid is mapped to which device. another (somewhat unrelated) useful tool is lsblk, which shows your current device and filesystem topology.
  2. create /etc/initramfs-tools/conf.d/cryptroot containing a similar line:
    CRYPTROOT=target=sda2_crypt,source=/dev/disk/by-uuid/...
    again, use the correct uuid instead of the “…”.
  3. mount /dev into the new environment by running
    mount -o bind /dev /target/dev
    (replace target with the path to the new system’s root directory.)
    then chroot the environment, and run the following commands:
    1mount -t proc proc /proc
    2mount -t sysfs sys /sys
    3mount -t devpts devpts /dev/pts
    4locale-gen --purge --no-archive
    5update-initramfs -k all -c
    

    this will set up the ram disk correctly so that it will deal with the encrypted root partition. (note that it usually will complain about an “invalid line” in /etc/crypttab. you can usually ignore this.)
  4. change GRUB_CMDLINE_LINUX in /etc/default/grub to something like
    GRUB_CMDLINE_LINUX="cryptopts=target=sda2_crypt,source=/dev/disk/by-uuid/...,lvm=sda2_crypt"
    again, think of replacing sda2_crypt if necessary and filling in the correct uuid.
  5. in the chroot environment, run update-grub.

after this, it should work. maybe you also have to install cryptsetup and/or lvm2 in the chroot environment, if it wasn’t already done by the installer.
anyway, i’m really looking forward to the moment when most distribution installers know how to (again!) deal with existing luks/lvm installations. i hope it won’t take as long as it took for basic hdd encryption find its way into the graphical installers in the first place. (that was, like, forever! and without an initiative of the eff, it might really have taken forever.)

today, i finally got around to try arch linux with xfce4 on my laptop. and considering how it looks, i will also install it on my desktop computer on the next reinstall. (currently, it still has ubuntu with xfce4 installed. and in case you wonder why i decided to try out a new system on my laptop: i’ve been using linux mint 14 the last couple of months, and was pretty unhappy both during install – setting up full disk encryption was somewhat annoying – and finally when trying to install wine recently, which simply didn’t work.)

i followed the beginner’s guide, which essentially told me what to enter on the console to set up arch linux. (note that arch linux does not come with a graphical install, you have to type a lot of commands in yourself. but apart from that, it actually works like a charm. so if you’re not scared by using the command line, it’s worth a try.)

there’s also a arch wiki entry about encrypting a lvm setup, which is what i was doing and wanted to continue doing – for example, to not again restart by copying all my data to the machine, but by simply re-using the encrypted partitions layout set up before. for the way i (and ubuntu) was doing it, that wiki entry pointed to a blog post by simon dittlmann, which explains how to set up a huge encrypted partition, which will contain a lvm (logical volume manager) group with root, home and swap partition. unfortunately, the blog post is somewhat older, and apparently the whole installation procedure of arch linux changed somewhat, so i had to improvise.

in order to create an up to date documentation on how to install arch linux with full disk encryption, both discussing how to create such a setup and how to install arch linux in an already existing such setup.

beginning installation: creating the encrypted partition.

first, follow the beginner’s guide up to the step “prepare the storage drive”. at this step, you have to do something else.

(in case you already have a working set-up, skip the next steps until the mark.)

follow the steps described in the beginner’s guide, create a small boot partition – this one will not be encrypted. i assume that it will be /dev/sda1. it should be a simple ext3/ext4 partition. (i usually give it 256 or 512 megabytes.)

then, create another partition (i assume it will be /dev/sda2), which consumes the whole left-over space on the hard disk. first, you should clear everything on that partition, preferably with random bits. you can for example do:
dd if=/dev/urandom of=/dev/sda2
this will take quite some time, though. alternatively, you can skip this step, and later, after encrypting the partition, overwrite the encrypted partition with zeros. (look down below for that.) afterwards, set up encryption on /dev/sda2:

1modprobe dm-crypt
2cryptsetup --verbose --cipher aes-xts-plain64 --key-size 512 --verify-passphrase luksFormat /dev/sda2

you will have to enter a passphrase (twice), which you will need later on every boot to unlock the disk. (note that you can later on change the passphrase as you like; look at the section passphrase management in an older blog-post by me.)

(edit: since there is now a successful attack on the aes-cbc-essiv encryption mentioned here earlier, i changed it to aes-xts-plain64, using a different approach.)

(mark: skip until here if you already have a working set-up.)

now you can unlock the encrypted disk:
cryptsetup luksOpen /dev/sda2 lvm

setting up the logical volumes.

(skip almost everything of this section if you already have a working set-up. the only thing you should not skip is the mounting below and enabling swap with swapon.)

after unlocking the encrypted volume, you have to create a volume group and logical volumes inside it. first, begin by creating a physical volume, which will contain the logical volumes. for that, we use the encrypted partition /dev/sda2, whose contents can be accessed by /dev/mapper/lvm. do the following:

1lvm pvcreate /dev/mapper/lvm
2lvm vgcreate vgroup /dev/mapper/lvm

you can replace vgroup with any name you want. i replaced it with the (future) hostname of my laptop. now you can use the following commands to create logical volumes. there should be at least one volume for root (/) and swap. i recommend to also create a volume for /home, so that your personal files are separated from the operating system and you can simply wipe out the operating system when you want to install a new one by formatting root, but not home. for such a setting, the commands are as follows:
1lvm lvcreate -L 16GB -n root vgroup
2lvm lvcreate -L 16GB -n swap vgroup
3lvm lvcreate -l 100%FREE -n home vgroup

(my machine has 16 gigabyte ram, whence i created a 16 gigabyte swap partition.)
don’t forget to replace vgroup if you used a different name above. you can also choose different names after -n. the next step is to format the data partitions as in the beginner’s guide:
1mkfs.ext4 /dev/mapper/vgroup-root
2mkfs.ext4 /dev/mapper/vgroup-home

to set up the swap, proceed as follows:
1mkswap /dev/mapper/vgroup-swap
2swapon /dev/mapper/vgroup-swap

finally, let us mount the partitions to install arch linux on them:

1mount /dev/mapper/vgroup-root /mnt
2mkdir -p /mnt/home /mnt/boot
3mount /dev/mapper/vgroup-home /mnt/home
4mount /dev/sda1 /mnt/boot

(you only need the mkdir if you created a new set-up. also, in case you created more logical volumes, you have to adjust the commands above.)

continue arch linux installation.

from this point on, you can follow the beginner’s guide to install arch linux from this point on. continue until the point of creating an initial ramdisk environment. there, you must edit /etc/mkinitcpio.conf and modify the HOOKS statement from
HOOKS="base udev autodetect modconf block filesystems keyboard fsck"
(or something similar) to
HOOKS="base udev autodetect modconf block encrypt lvm2 filesystems keyboard fsck"
note that you must insert encrypt lvm2 in precisely this order somewhere before filesystems. afterwards, continue with running mkinitcpio -p linux (or continue editing the config file if necessary).

now you can continue with setting the root password.

the next step where you have to pay attention is the step where you set up the boot loader. i chose grub here. set it (or syslinux) up as described in the beginner’s guide. in the case of syslinux, you have to modify /boot/syslinux/syslinux.cfg, and in the case of grub, you have to modify /boot/grub/grub.cfg. in the case of syslinux, you should have two entries (regular system and fallback)
APPEND root=/dev/mapper/vgroup-root ro
for syslinux and
linux /vmlinuz-linux root=/dev/mapper/vgroup-root ro quiet
for grub, or something similar. for all such entries, insert cryptdevice=/dev/sda2:vgroup between root=… and ro; that is, the entries should look like
APPEND root=/dev/mapper/vgroup-root cryptdevice=/dev/sda2:vgroup ro
for syslinux and
linux /vmlinuz-linux root=/dev/mapper/vgroup-root cryptdevice=/dev/sda2:vgroup ro quiet
for grub.

change (2014/04/13): in case you want to use grub, it is better to proceed as follows. edit the line GRUB_CMDLINE_LINUX in /etc/default/grub and add cryptdevice=/dev/sda2:vgroup there. then, run grub-mkconfig -o /boot/grub/grub.cfg as described in the beginner’s guide. this automatically adds this to all entries in grub.cfg. end of change.

afterwards, continue with the beginner’s guide. after the next reboot, you should be asked for a password to unlock the volumes. after entering it correctly, the system should boot up as normal.

i recently presented a bash script which schedules computational tasks on multi-core machines. in the meanwhile, i fixed a bug in the display, made the program more flexible, and started to use local variables instead of global variables only. the new version is also more intelligent: it tries to adjust the running times of its controlled processes so that the running times are not far apart.
here is the newest version:

  1#/bin/bash
  2
  3initProfile() {
  4    PROFILEFN=bigprimerunner-$PROFILE.profile
  5    CORES=`grep "^CORES " $PROFILEFN`
  6    CORES=${CORES/CORES }
  7    STARTUP=`grep "^STARTUP " $PROFILEFN`
  8    STARTUP=${STARTUP/STARTUP }
  9    eval STARTUP=$STARTUP
 10}
 11
 12# Startup
 13LOADMODIFIER=0
 14if [ "$1" != "" ]
 15then
 16    PROFILE=$1
 17else
 18    PROFILE=`hostname`
 19fi
 20if [ "$2" != "" ]
 21then
 22    LOADMODIFIER=$2
 23fi
 24initProfile
 25if [ "$CORES" == "" ]
 26then
 27    echo "Cannot load profile $PROFILEFN!"
 28    exit
 29fi
 30echo Cores: $CORES
 31echo Load modifier: $LOADMODIFIER
 32
 33# The command to execute
 34COMMAND=primefinder
 35
 36computeFreecores() {
 37    FREECORES=0
 38    local DAY=`date +%w`
 39    local LINE=`grep "^$DAY " $PROFILEFN`
 40    local LINE=${LINE/$DAY }
 41    local HOUR=`date +%k`
 42    for ((i=0;i<$HOUR;++i));
 43    do
 44        local LINE=${LINE#* }
 45    done
 46    local LINE=${LINE/ *}
 47    eval FREECORES=$LINE
 48    # Also determine how many jobs should be started
 49    STARTUP=`grep "^STARTUP " $PROFILEFN`
 50    STARTUP=${STARTUP/STARTUP }
 51    eval STARTUP=$STARTUP
 52}
 53
 54killProcess() { # One argument: PID of process to kill
 55    local PID=$1
 56    local FILE=`lsof -p $PID -F n 2>/dev/null | grep primedatabase | grep -v "\.nfs"`
 57    kill $PID 2> /dev/null
 58    local A=${FILE#n*}
 59    local A=${A/ (nfs*}
 60    if [ "$A" != "" ]
 61    then
 62        rm $A
 63        echo Killed $PID with open file $A
 64    else
 65        echo Killed $PID with no open file
 66    fi
 67}
 68
 69stopsignal() {
 70    local PIDS=`jobs -p`
 71    echo
 72    echo
 73    echo Terminating...
 74    echo Killing: $PIDS
 75    for PID in $PIDS;
 76    do
 77        killProcess $PID
 78    done
 79    echo done.
 80    exit
 81}
 82
 83trap 'stopsignal' 2
 84
 85computeFreecores
 86
 87echo "Starting $STARTUP instances (in $BINDIR)"
 88
 89filterRunning() { # Removes all PIDs from the arguments which are currently stopped
 90    ps -o pid= -o s= $* | grep R | sed -e "s/R//"
 91}
 92
 93filterStopped() { # Removes all PIDs from the arguments
 94    ps -o pid= -o s= $* | grep T | sed -e "s/T//"
 95}
 96
 97determineToAdd() {
 98    computeFreecores
 99    local LOAD=`uptime`
100    local LOAD=${LOAD#*average: }
101    local LOAD=${LOAD/,*}
102    local LOAD=${LOAD/.*}
103    ADD=$[CORES-FREECORES-(LOAD+LOADMODIFIER)]
104    local JOBS=`jobs -p`
105    local JOBS=`filterRunning $JOBS`
106    echo "Load: $[LOAD+LOADMODIFIER], Intended number of free cores: $FREECORES, Running: `echo $JOBS | wc -w`, Started: `jobs -p | wc -l` (should be $STARTUP)"
107}
108
109continueOne() {
110    local JOBS=`jobs -p`
111    local JOBS=`filterStopped $JOBS`
112    if [ "$JOBS" != "" ]
113    then
114        local PID=`ps -o pid= --sort +time $JOBS | head -1`
115        echo Continuing $PID...
116        kill -SIGCONT $PID
117    fi
118}
119
120stopOne() {
121    local JOBS=`jobs -p`
122    local JOBS=`filterRunning $JOBS`
123    if [ "$JOBS" != "" ]
124    then
125        local PID=`ps -o pid= --sort -time $JOBS | head -1`
126        echo Stopping $PID...
127        kill -SIGSTOP $PID
128    fi
129}
130
131killOne() {
132    local JOBS=`jobs -p`
133    if [ "$JOBS" != "" ]
134    then
135        local PID=`ps -o pid= --sort -time $JOBS | head -1`
136        killProcess $PID
137    fi
138}
139
140launchOne() {
141    echo "Launching \"$COMMAND\"..."
142    $COMMAND &
143    sleep 1.5
144}
145
146computeTotaltimeInSecs() {
147    # Input: $1
148    # Output: $TOTALSECS
149    local I=$1
150    local SECS=${I##*:}
151    local REST=${I%:*}
152    local MINS=${REST##*:}
153    local REST=${REST%:*}
154    local HOURS=${REST##*-}
155    local DAYS=`expr "$REST" : '\([0-9]*-\)'`
156    local DAYS=${DAYS%-}
157    if [ "$DAYS" == "" ]
158    then
159        local DAYS=0
160    fi
161    if [ "$HOURS" == "" ]
162    then
163        local HOURS=0
164    fi
165    if [ "$MINS" == "" ]
166    then
167        local MINS=0
168    fi
169    echo "((($DAYS * 24) + $HOURS) * 60 + $MINS) * 60 + $SECS" | bc
170}
171
172adjustProcesses() {
173    local JOBS=`jobs -p`
174    local JOBS=`filterRunning $JOBS`
175    if [ "$JOBS" != "" ]
176    then
177        local STOPPID=`ps -o pid= --sort -time $JOBS | head -1`
178        local JOBS=`jobs -p`
179        local JOBS=`filterStopped $JOBS`
180        if [ "$JOBS" != "" ]
181        then
182            local CONTPID=`ps -o pid= --sort +time $JOBS | head -1`
183            # Compute times
184            local I=`ps -o time= $STOPPID`
185            local STOPSEC=`computeTotaltimeInSecs $I`
186            local I=`ps -o time= $CONTPID`
187            local CONTSEC=`computeTotaltimeInSecs $I`
188            # Compare times
189            local CT=`echo $CONTSEC+60*5 | bc`
190            if [ $STOPSEC -gt $CT ]
191            then
192                echo Stopping $STOPPID and continuing $CONTPID
193                kill -SIGSTOP $STOPPID
194                kill -SIGCONT $CONTPID
195            fi
196        fi
197    fi
198}
199
200# Start programs in the background
201determineToAdd
202for ((i=1;i<=STARTUP;++i));
203do
204    launchOne
205    if [ $i -gt $ADD ]
206    then
207        sleep 1
208        kill -SIGSTOP %$i
209    fi
210done
211
212# Start mainloop
213while [ 1 ]
214do
215    sleep 60
216    
217    # Determine how many processes should be added/removed
218    determineToAdd
219
220    # Stop/continue processes
221    if [ $ADD -gt 0 ]
222    then
223        # Add processes
224        echo ADD:$ADD
225        for ((i=0;i<ADD;++i))
226        do
227            continueOne
228        done
229    fi
230    if [ $ADD -lt 0 ]
231    then
232        REM=$[-ADD]
233        # Remove processes
234        echo REMOVE:$REM
235        for ((i=0;i<REM;++i))
236        do
237            stopOne
238        done;
239    fi
240
241    # Launch new processes or kill running ones
242    CURRLAUNCHED=`jobs -p | wc -l`
243    if [ $STARTUP != $CURRLAUNCHED ]
244    then
245        if [ $STARTUP -lt $CURRLAUNCHED ]
246        then
247            echo kill: $STARTUP $CURRLAUNCHED
248            for ((i=STARTUP;i<CURRLAUNCHED;++i));
249            do
250                killOne
251            done;
252        else
253            echo add: $CURRLAUNCHED $STARTUP
254            for ((i=CURRLAUNCHED;i<STARTUP;++i));
255            do
256                launchOne
257            done;
258        fi
259    fi
260    sleep 2
261    
262    # Adjust
263    adjustProcesses
264done
posted in: computer
tags:
places:

ever had the problem that you have access to a big machine (with many cores), and you want to run many (tens of thousands) small computations, but you want to make sure that not too many cores are used?
i’ve had this problem, and since i now have a pretty nice (i think so) solution, i thought that maybe more people are interested in it. so here’s my setup. i have a program, let’s call it primefinder, which, for a certain input n (where n is a natural number ≤ 21000), computes a prime of n bits with special properties. the program loops over all possible n, and checks for each n if a file n.prime exists. if it does not, it creates it (with zero content), computes the prime (which can take between minutes and days), writes the prime into the file and continues with the next file. this simple task distribution technique allows me to run the program in parallel on different machines (since the files are in a nfs folder) with many instances on each machine. now at our institute, we have a big computation machine (64 cores) and four user machines (on which the users work, each 32 cores). since the user machines are often not intensively used (and that only during certain times of the day), i want to use these as well. but there should be enough cores free, so the users won’t notice that there are computations going on in the background. on the computation server, also other people want to run something, so there should also be some free cores. optimally, my program would somehow decide how many cores are used by others, and use the rest. or most of them, to leave some free, especially on the user machines.
after a suggestion by our it guys, i started writing a bash script which controls the instances of my program on the same machine. the first version used the time of the day to determine the number of processes. everything was computed in terms of the number of cores of the machine, the load (with a load modifier applied, since some machines have uninterruptable processes running which do not effectively do something, and which won’t go away until the next reboot) and the hour of the day. but it is not easy to find a good scheme which yields good results on all machines. something which works well on the user machines is wasting processor time on the computation server.
so today i rewrote the program to use profiles. a profile contains information on the number of cores (this is necessary since the computation server has hyperthreading enabled, and thus returns twice the number of cores), the number of processes to be started, and the number of cores to be left free during each hour and day of a week. so on weekends or nights, i choose lower numbers for the free cores for the user machines, while for the computational server the number is always 1.
a profile can look like this (this is from a user machine, the file is called primefinderrunner-user.profile for later reference):

1CORES 32
2STARTUP $[CORES-CORES/8]
30 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8]
41 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/4] $[CORES/4] $[CORES/4] $[CORES/8]
52 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/4] $[CORES/4] $[CORES/4] $[CORES/8]
63 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/4] $[CORES/4] $[CORES/4] $[CORES/8]
74 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/4] $[CORES/4] $[CORES/4] $[CORES/8]
85 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/2] $[CORES/4] $[CORES/4] $[CORES/4] $[CORES/8]
96 $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/8] $[CORES/16] $[CORES/16] $[CORES/16] $[CORES/16]

the line with prefix CORES gives the number of cores. the line prefixed by STARTUP gives the number of processes to run (at most); here, we use 7/8 of the number of cores. the lines prefixed by a number between 0 (sunday) and 6 (saturday) have 24 entries following: every entry (seperated by exactly one space, as the prefix itself is separated by exactly one space from the entries!) says how many cores should be free at each time of the day. usually during night (up to 7 am) at least 1/16 of the total number of cores should be free, while during workday (8 am to 7 pm) half of the cores should be free. of course, the numbers are different for weekends (saturday and sunday) than for the other working days.
now the script itself looks like this (for reference, the filename is primefinderrunner.sh):

  1#/bin/bash
  2 
  3initProfile() {
  4    PROFILEFN=primefinderrunner-$PROFILE.profile
  5    CORES=`grep "^CORES " $PROFILEFN`
  6    CORES=${CORES/CORES }
  7    STARTUP=`grep "^STARTUP " $PROFILEFN`
  8    STARTUP=${STARTUP/STARTUP }
  9    eval STARTUP=$STARTUP
 10}
 11 
 12LOADMODIFIER=0
 13if [ "$1" != "" ]
 14then
 15    PROFILE=$1
 16else
 17    PROFILE=`hostname`
 18fi
 19if [ "$2" != "" ]
 20then
 21    LOADMODIFIER=$2
 22fi
 23initProfile
 24if [ "$CORES" == "" ]
 25then
 26    echo "Cannot load profile $PROFILEFN!"
 27    exit
 28fi
 29echo Cores: $CORES
 30echo Load modifier: $LOADMODIFIER
 31 
 32computeFreecores() { 
 33    # two arguments: day (0..6) and hour (0..23)
 34    FREECORES=0
 35    DAY=`date +%w`
 36    LINE=`grep "^$DAY " $PROFILEFN`
 37    LINE=${LINE/$DAY }
 38    HOUR=`date +%k`
 39    for ((i=0;i<$HOUR;++i));
 40    do
 41        LINE=${LINE#* }
 42    done
 43    LINE=${LINE/ *}
 44    eval FREECORES=$LINE
 45}
 46 
 47computeFreecores
 48 
 49stopsignal() {
 50    for PID in `jobs -p`;
 51    do
 52        FILE=`lsof -p $PID -F n 2>/dev/null | grep primedatabase | grep -v "\\.nfs"`
 53        A=${FILE#n*}
 54        A=${A/ (nfs*}
 55        echo killing $PID with open file $A
 56        rm $A
 57        kill $PID
 58    done
 59    exit
 60}
 61 
 62trap 'stopsignal' 2
 63 
 64echo "Starting $STARTUP instances"
 65 
 66determineToAdd() {
 67    computeFreecores
 68    LOAD=`uptime`
 69    LOAD=${LOAD#*average: }
 70    LOAD=${LOAD/,*}
 71    LOAD=${LOAD/.*}
 72    ADD=$[CORES-FREECORES-LOAD-LOADMODIFIER]
 73    echo Load: $[LOAD-LOADMODIFIER], Intended number of free cores: $FREECORES
 74}
 75 
 76# Start programs in the background
 77determineToAdd
 78for ((i=1;i<=STARTUP;++i));
 79do
 80    primefinder &amp;
 81    sleep 2
 82done
 83sleep 20
 84if [ $ADD -lt 0 ]
 85then
 86    ADD=0
 87fi
 88for ((i=ADD+1;i<=STARTUP;++i));
 89do
 90    kill -SIGSTOP %$i
 91done
 92 
 93CURRRUNNING=$ADD
 94RUNNINGSTART=1 # The first one running
 95RUNNINGSTOP=$CURRRUNNING # The last one running
 96 
 97startOne() {
 98    # Assume that $CURRRUNNING < $STARTUP
 99    RUNNINGSTOP=$[(RUNNINGSTOP % STARTUP) + 1]
100    kill -SIGCONT %$RUNNINGSTOP
101    CURRRUNNING=$[CURRRUNNING+1]
102}
103 
104stopOne() {
105    # Assume that $CURRRUNNING > 0
106    kill -SIGSTOP %$RUNNINGSTART
107    RUNNINGSTART=$[(RUNNINGSTART % STARTUP) + 1]
108    CURRRUNNING=$[CURRRUNNING-1]
109}
110 
111# Start mainloop
112while [ 1 ]
113do
114    sleep 60
115 
116    # Determine how many threads should be added/removed
117    determineToAdd
118    if [ $ADD -gt 0 ]
119    then
120        if [ $[ADD+CURRRUNNING] -gt $STARTUP ]
121        then
122            ADD=$[STARTUP-CURRRUNNING]
123        fi
124        # Add processes
125        echo ADD:$ADD
126        for ((i=0;i<ADD;++i))
127        do
128            startOne
129        done
130    fi
131    if [ $ADD -lt 0 ]
132    then
133        REM=$[-ADD]
134        # Clip
135        if [ $REM -gt $CURRRUNNING ]
136        then
137            REM=$CURRRUNNING
138        fi
139        # Remove processes
140        echo REMOVE:$REM
141        for ((i=0;i<REM;++i))
142        do
143            stopOne
144        done
145    fi
146    sleep 60
147done

the script first starts all instances, then stops the ones which are too many, and then starts the main loop. in the main loop, it waits 60 seconds (for the average load to adjust to the new process count), and then decides how many cores should be left free, and what that means for the number of processes (add/remove some). note that the profile file is read every minute, so it can be changed any time without any need to re-run the whole thing.
in case the script is stopped (with control+c), all primefinder processes are killed and their open file is deleted. to determine the open file, i use lsof with some greps. you have to adjust and test that line before using this script!
note that this script is quite a hack, and far from perfect. and it is somehow system dependent, or at least “setup dependent” since it has certain assumptions on the executables, on how the output of lsof looks like, … so better make sure it works before you use it, especially on bigger systems. also note that in the beginning, all instances are ran (they are started with a two second delay between two instances), and then everything is run for 20 seconds before the first adjustment (i.e. stopping processes which are too many) are made. if you share the system with other people, this might already annoy others when they try to measure timings of their programs (especially if hyperthreading is enabled).

posted in: computer
tags:
places:

does anyone knows how the ulimit bash (or whatever shell you like) command works? i’m currently running a few instances of the same program on a big server, and it happened yesterday twice that the programs together ate up all available memory (some of the instances using much more than others). there was a hard ulimit set on memory, and the result was that all these processed were killed. not just the one violating the memory limit in that moment.
this sucks pretty much, since this destroyed some cpu days of work. does anyone knows why ulimit is doing this? i assume that the rationale is to stop fork bombs, but in this case this is really, really annoying. killing one of the processes would have been perfectly enough…
so, if anyone has good documentation on how ulimit works, whether it is possible to change this behaviour, and whether this is actually intended or a bug, i would like to hear about it…

posted in: computer
tags:
places:

today i had to shuffle the lines of a text file. the last time i tried to do this in the bash i only found the hint to use sort -R. this actually produced the somehow strange order of the music video series. today i found out that there’s another way to do this, namely by using the command shuf. it works the same way as sort. i just wanted to write this down so i won’t forget :-)

posted in: computer
tags:
places:

i got another external hard drive today. the main reason is that i want to encrypt my (current) backup harddisk, which requires reformatting the disk. but if i do so, i’m left with nothing but the original data on the laptop, and no backup. in case something goes terribly wrong, i’m screwed. i just created an encrypted partition on the disk; this is really pretty easy and not much command line typing is required, in particular if everything is set up: then linux will ask me for the password as soon as i plug the usb cable in, and automatically mount it using that password. that’s how it should be. and so far, it works perfect.
currently, rsync is mirroring my home directory onto the disk. as soon as it is done, i will copy some stuff from the other backup disk over (like my server’s backups) which i don’t have on the laptop’s harddisk (which is 180 gb smaller than each of the backup disks), and after that, my old backup disk will be reformatted as well and also filled.
after that, i will deposit one of the backup drives somewhere outside my apartment: in case something goes wrong (like house burns down, someone decides to break in, …), i still have a backup somewhere. and, as it is encrypted, nobody but me can read it. (even if someone breaks in here, and steals both laptop and backup, they can’t access the data without my password. and yes, i am aware of xkcd.)