Requirement for High Availability
In the previous post , I talked about how I use Pi-hole for my DNS resolution. DNS is the core component for the internet to work, so if I were to be patching or rebooting the host running Pi-hole, no devices in my home network would be able to use the internet. A simple fix would be to run two separate hosts with Pi-hole, like another Raspberry Pi Zero. But, in both Windows and Linux based operating systems, I’ve seen issues where the time to switch to the secondary DNS when the primary goes down is a lot, and it leaves with a gap in time where the internet is still down for the end user.
Solution to the DNS problem: Keepalived
The fix to this problem is achieving high availability at a network layer, way below where DNS operates. The Virtual Router Redundancy Protocol (VRRP) is widely used by enterprise routing devices to have such a feature. Keepalived is a routing software written in C which implements the VRRP finite state machine (FSM). Keepalived can do a lot more, but at its very core, it can provide a virtual IP address (VIP or floating IP) that is held by a master node defined in the configuration as a way to provide high availability. When the master node goes down or is unreachable, one of the backup nodes takes over based on the priority. All this means that the same IP address moves from one node to another in case of failure.
Adding this solution to our earlier problem, we get two nodes with Pi-hole and Keepalived installed. Both nodes share a single IP address, the VIP. Using this as the DNS server, whenever the primary Pi-hole instance goes down, the VIP transparently switches to the backup node, while the DNS clients see no difference at all. It is business as usual for them since the IP address does not change.
Solution for synchronization: Gravity Sync
The above setup works just fine as it. But if you notice, the two Pi-hole instances do not communicate with each other. Let’s say that you want to add a new domain to blocklist/allowlist or add a new DNS A record. You would need to do that on both devices separately, doubling the amount of work. If, like me, you have 3 nodes running it, keeping those in sync can be a nightmare. Enter Gravity Sync
, a tool to keep the Pi-hole instances in sync automatically on schedule. The way it works under the hood is by using SSH and rsync
to keep the gravity database and dnsmasq
configs in sync. This means that anytime you make a change in the primary instance, during the scheduled sync, these changes will be copied over to all the backup instances as well.
My running setup showcase
This is the statistics from my primary node that serves most of the traffic. I have close to a million entries in the blocklist and an impressive 58% block rate, most of which are google ad domains and other tracking sites.
The above diagram represents how I have it set up in my home lab. The two instances running in virtual machines carry the bulk of the load as they hold the two Keepalived VIPs. If one of them fails, the other node takes over both the VIPs due to how the priority is set. If the two nodes fail, both VIPs transfer to the Raspberry Pi Zero which is physically separate from the ESXi server.
The reason why the Pi does not get control of either of the VIPs until two nodes fail is that the Pi Zero is slow. Not just in terms of DNS requests handled per second (plenty sufficient for normal use, but servers tend to be chatty with DNS), but also the amount of bandwidth available since it connects to the network over a 2.4GHz wireless connection. The VMs have wired ethernet at gigabit speeds, so it is not an issue.
Sample Keepalived config
Here is a sample config from the primary node (named as nuc
here) to give you an idea:
|
|
Outcome of this adventure
With this, I have one highly available pair and another redundant set for my DNS services, meaning any reboot/patching of the hypervisor or individual VMs would not affect the internet access for devices in my network.