One of the ways to make your home network more resilient to outages is by setting up high availability and methods of failover, so that one broken or misbehaving network appliance doesn't result in no connectivity for everyone. Setting up this level of redundancy is relatively simple these days, but it will take a few hours, so block out the time accordingly.
They say that no plan survives contact with the enemy, and that means one thing when it comes to networking. You have to break up your carefully constructed network, so you can see what happens in real-world conditions. Or if you prefer, you can do this with virtual machines and turn off one of the virtual network connections to simulate, and then copy those settings to your network appliances once you know they work.
Disaster planning is key to success
Hi, it's me. The disaster in question
Absolutely nowhere else exemplifies the "hope for the best, plan for the worst" mantra quite like networking. Every piece of hardware, every program, every cable could be the difference between faultless connectivity and no connectivity at all, and the methods used in the datacenter can be scaled down and applied at home with minimal effort. You back up your data, so why not have infrastructure backups as well?
My critical virtual machines are all installed on a high-availability Proxmox cluster, so downtime is minimized. This configuration isn't just disaster planning for if something goes wrong, although it does that. The cluster enables me to handle updates and upgrades on one node while the others pick up the slack, so connectivity doesn't go down even for a minute. And we all know that the first reboot after an upgrade is the worst time and also one of the most likely times for the system not to boot properly.
This cluster doesn't just guard against hardware failure or networking snafus; it's a hedge against my need to tweak things, so the rest of the house doesn't get affected if (sorry, when) things go wrong. It was hard enough getting a buy-in for some of the self-hosted services we now rely upon, and I know full well the cloud subscriptions will resume if my cosplaying as a network engineer doesn't see the same level of uptime.
Proxmox
OPNsense makes things easier
CARP, pfSync, and HA work together to fail gracefully
I've got OPNsense on several devices now, from a mini PC with six large-capacity NVMe SSDs installed for TrueNAS use, to an official DEC750 hardware appliance, and a few scattered virtual machines. This made it fairly quick to set up two OPNsense installations with one as redundant failover, so if the main router and firewall have connectivity issues, the other one takes over. This is enabled because of three features that not every router has:
- CARP (Common Address Redundancy Protocol) - This uses IP protocol 112 and uses multicast packets to keep the other clustered devices updated with the current status
- pfsync - Replicates the status information of individual network connections, and without it you won't have the same level of mirrored settings
- XMLRPC sync - Keeps the firewall configurations in sync
In practice this had me scratching my head over virtual networking routes, because it needs three interfaces on the OPNsense box to work correctly (WAN, LAN, pfSync), and the mini PC I usually use only has two NICs. When I set this up more permanently, I'll get either another DEC750 or an Intel-based mini PC with four NICs to save a few headaches getting things working.
Subnetting to success
Each of the three interfaces gets its own subnet, with the three pairs of ports getting an IP address in the corresponding subnet. A little bit of static IP setting on each OPNsense box and three firewall rules later (LAN and WAN both need CARP packets enabled, PFSYNC needs an allow any rule), and I was most of the way to success.
The primary firewall gets virtual IPs on both LAN and WAN that are set to enable CARP, both with a long, random password. It's also critical to set up a manual outbound NAT rule for the WAN port with LAN as the source to translate it into the virtual WAN IP. DHCP doesn't need to be set if you prefer manually dealing with handing out IP addresses, but if you do set it up, it needs a failover peer IP with the IP of the other firewall, and firewall 2 needs a corresponding rule pointing back at the primary firewall IP.
Then the High Availability settings on both firewalls needed filling in, using PFSYNC as the sync interface, and whichever services you want synced need selecting from the dropdown. This could be as simple as Dashboard, Firewall Rules, Aliases, NAT, DHCPD, and Virtual IPs, or you might have other services like Unbound or certificates to keep synchronized. Oh, and don't forget to add the CARP status widget to the dashboard before rebooting. I forgot the first time and was trying to determine whether the settings worked.
It was real fun yanking out a network cable and seeing what happened on the dashboard of firewall number 2. Within seconds, it realized the primary firewall was down and changed over to the virtual LAN IP to restore connectivity. I'm not sure I would have even noticed if I'd been browsing and not staring at the dashboard, which surprised me. I was expecting a longer period of no networking, like every time I have to reboot things or put a new router in.
OPNsense
Don't forget about any external DNS servers you might have
You don't want name resolution to fail as well
I've been using Technitium as my local DNS server, but it is currently missing one feature that Pi-hole makes so easy. That's high availability, which, much like our Proxmox cluster and OPNsense pairing, needs two Pi-holes on our network. That's one primary DNS server, and then a mirrored secondary DNS server in case of emergency. Except it's perhaps closer to call it a spare DNS server, because most devices can't (or won't) let you set two DNS servers like that. Windows is the odd one out here for both letting you and prioritizing the first DNS provider, while anything with systemd will do "parallel probe" resolution of all servers and pick the fastest.
And while that DNS is set in the DHCP server, so every device on the network uses it, what happens if it drops off the network? Well, to start with, having Keepalived running and watching the Pi-holes handle failover nicely, and using Nebula Sync keeps the blocklists, DHCP assignments, and other settings synchronized so both Pi-holes are as close to being exact clones as possible.
Pi-hole
- OS
- Linux
- Price model
- Free
My network failed, but my preparations held fast
It's fun to wargame out what happens if someone pulls a network cable out or a piece of hardware fails. I learned a few lessons doing this, like having a network switch at the edge of my network before my OPNsense routers to enable graceful failover, while also handing out virtual IPs for the setup. That's not something I would have thought of before this exercise.
I also learned that I'm simultaneously risk-averse while having a YOLO attitude to pulling out network cables from a running system. Levels of risk, I guess. Anyway, now that I've modeled this in a mix of virtual and physical hardware, I will build it again with all physical hardware while my rack gets installed. That will give me peace of mind, and anything short of a complete power cut will mean my network won't drop so easily.
