Round Robin DNS can be useful!
I Was Wrong About Round Robin DNS
For the past ten years, I’ve been confidently peddling a lie. Not maliciously, of course—but I’ve been telling anyone who’d listen that Round Robin DNS (RRDNS) is useless for high availability. Turns out, I was wrong. Two clever solutions crossed my path last week and made me rethink everything. So, consider this my public mea culpa.
First, let’s cover the basics: Round Robin DNS is a method where a single DNS record rotates among multiple IP addresses in response to queries. Think of it as a bouncer at a club, directing guests to one door, then the next, and so on. When combined with resilient protocols (like DNS itself), RRDNS actually works quite well. If you need proof, run dig . @a.root-servers.net
, and you’ll see it in action at the root DNS level.
But here’s the rub: when you use RRDNS to distribute traffic to less forgiving protocols (like HTTP), it often goes pear-shaped. Imagine you’ve got two nodes serving your website:www IN A 10.1.1.2
www IN A 10.1.1.3
If one of those nodes keels over, half your users are still going to be sent to the dead IP. This is decidedly not high availability. True high availability means a system only fails when all nodes are down—not when just one has popped its clogs.
The Breakthrough: Floating IPs
Here’s the game-changer: you don’t point your RRDNS records to the actual IPs of your nodes. Instead, you use virtual IPs that can float between servers. Imagine the setup looks like this:www1 IN A 10.1.1.2
www2 IN A 10.1.1.3
www IN A 10.1.1.201
www IN A 10.1.1.202
Each server keeps its own IP (so you can still SSH in and fiddle with it), but the user-facing IPs (10.1.1.201
and 10.1.1.202
) can roam around like nomads. If one server dies, the other scoops up its floating IPs, keeping everything running smoothly.
Two Tools That Make It Happen
Now, I didn’t come up with these solutions—I just watched two very smart people demonstrate them. But they’re clever, they work, and they deserve a shoutout:
- Wackamole
Brilliant name, brilliant tool. If you’re already using the Spread messaging system, Wackamole is a no-brainer. Servers on a Spread ring gossip about their health, and if one stops talking (or announces its retirement), another server grabs its floating IPs. Wackamole even handles the boring admin bits, like telling other devices to update their ARP caches. - VRRP
VRRP (Virtual Router Redundancy Protocol) is a trusty old dog that’s been keeping routers redundant for years. Turns out, it can run on Linux servers too. Each floating IP gets its own VRRP group, so for two IPs, you’d run something like this:bashCopyEdit/usr/local/sbin/vrrpd -Ni eth0 -v 48 -p 120 -g 3 -t 10.1.1.201 -S /var/run/vrrpd_48.state
/usr/local/sbin/vrrpd -Ni eth0 -v 49 -p 100 -g 3 -t 10.1.1.202 -S /var/run/vrrpd_49.state
It’s not as sleek as Wackamole, but it doesn’t rely on Spread, which might be a plus if you like to keep things simple.
Some Things to Keep in Mind
RRDNS with floating IPs is not magic. It’s a bit like nudging traffic with a polite "over there, please," rather than directing it with military precision. This is load sharing, not load balancing. Plus, if one node in a two-server setup fails, the other has to handle double the traffic. That might be fine—or it might bring your remaining server to its knees. Planning is key.
Still, this approach has its charms. It’s inexpensive, relatively simple, and with a bit of elbow grease, it works. Sometimes, the best solutions don’t involve fancy load balancers or expensive software. Sometimes, it’s just about using the tools you have in unexpected ways. And if it means I’ve got to admit I was wrong about RRDNS, well… so be it.