Auto Restart KVM VMs while forcing KVM snapshot reversion

(this is mostly notes for my later self - feel free to drop me a line to ask questions)

TLDR My problem:

I have some stateless throwaway vm instances which discard all changes every power-off. Also, the closed-source software in the vm has driver issues that occasionally require a reboot, detected from inside the vm and initiated as an OS reboot. When rebooting, I want to simulate power-off/power-on to trigger the KVM snapshot discard. And, I want to make sure that any vm that shuts down gets restarted. How? Obscure qemu-kvm options and libvirt hooks to the rescue - special use for -snapshot and -no-reboot.

More detail:

I use libvirt (virsh) to manually control some transient, sort of stateless KVM virtual machines. I need to periodically stop/start those vms automatically (from inside so I can do a clean service shutdown). Also I want to auto-restart particular vms any time they shut down. The VMs run kvm-qemu snapshots (-snapshot option) that throw away all system changes at every power-off.

I have a close-source application that occasionally gets into an error state that can not be resolved by restarting the software (driver issues). It requires a full (virtual) system reboot.

A small monitor was written prior to my involvement in this project that can detect the unfixable error state, and initiate a nice clean service shutdown and reboot to resolve the problem. In KVM this results in a warm boot which doesn't throw away the -snapshot saved changes.

So, qemu-kvm has an option -no-reboot that forces process exit when the vm tries to do a warm boot. This shuts off the VM but does not restart it. I need to auto-restart vms that shut themselves down by trying to reboot.

So, I really have three requirements:

  • qemu-kvm needs to be invoked with -snapshot (fresh image every power off)
  • Reboots should really be a libvirt stop/start to get a 'cold boot' effect and throw away the snapshot,
  • Servers can reboot themselves. These warm boots are turned into a VM stop, and they need to be auto restarted asap

While qemu-kvm has the support I need, the version of libvirt on Centos 7 that I use doesn't have direct support for either option. I make a wrapper for qemu-kvm and then specify a custom emulator for these vms. I use both -snapshot and -no-reboot like this:

1
2
#!/bin/sh
    exec /usr/libexec/qemu-kvm "$@" -snapshot -no-reboot

And then we replace the block in the vm definition. This meets my first two requirements, but any system reboot or periodic shutdown will stay off.

What can I do to make sure they are always running? First mark them to start on host boot:

virsh autostart foo

which makes them start when the host boots but I still need to make sure they get restarted if they stop.

There are a few options I considered to solve this problem.

First, I could add some kind of 'forever' loop to the emulator script above that will just run the emulator again once it exits. e.g.

 #!/bin/sh
    while true; do
        exec /usr/libexec/qemu-kvm "$@" -snapshot -no-reboot
    done

Or Second, I could write some kind of supervisor that tries to start any stopped vms, like a standalone daemon, or an every-minute-run cronjob.

Third, Libvirt supports event hooks. I can run my own code when certain events happen. The hook would watch for shutdown events and then 'virsh start' a vm that just shut down. This is the path I started with (because I use hooks for some autoscaling functionality too). The problem is that there is a deadlock. "virsh start foo" in a hook is running before the VM is actually stopped and the process will hang forever waiting for the shutdown to finish.

I confess to taking the lazy way out. I have a hook that gets shutdown events, forks child in the background and returns. The child sleeps for 3 seconds - presumably enough for the vm to stop, and then does a 'virsh start' like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/sh
# install into /etc/libvirt/hooks/qemu
# any vm that shuts down will be restarted

if [ "$#" -lt 2 ]; then
    echo "usage: $0 <domname> <event> end -"
fi

# release event signifies shutdown is finished
if [ "$2" == 'release' ]; then
    sh -c "sleep 3; virsh start $1" < /dev/null 2&>1 >/dev/null &
fi

Dirty, but it works. Note that Centos 7 does not by default have /etc/libvirt/hooks - you have to make that directory and restart libvirtd to pick up any hooks you add.

If I didn't have other hooks code (for autoscaling), I probably would have gone with the first option above as it is a trivial few more lines of bash.

Is there some other way I should have done this? Drop me a line if you see something obvious I missed.