If you’re even remotely into computing, you’ve definitely heard the age-old tenet about creating regular backups of essential data. After all, few things are worse than losing important files when your paraphernalia kicks the bucket out of nowhere. As you go deeper into the tinkering rabbit hole, a backup workflow becomes practically indispensable for your everyday machines, self-hosted tools, and virtual guests.
That said, automated snapshots are just a part of the data redundancy game. Even with a proper 3-2-1 workflow involving multiple nodes, it’s possible for your data to become unusable, making frequent restoration and verification tasks crucial for your backups.
5 things you need to back up in your home lab
Make recovery operations a lot easier by backing up these essential aspects of your home lab
Your backups aren’t infallible
Things can go wrong over long periods of time
Let’s start with all the maladies that your backups can contract on their own. And I’m not talking about failed drives, ransomware, accidental deletions, or untoward incidents that can damage the physical NAS rigs. When you hoard files for long periods, it’s possible for the datasets to get corrupted due to a variety of reasons, which range from random shutdowns and firmware bugs to flipped bits caused by physical degradation of DRAM cells, electromagnetic interference, or even something as random as cosmic radiation.
Worse still, these corrupted files can go unnoticed for a long time and even propagate silently to other backup data. Heck, it’s even more problematic for backups, because unlike typical virtual guest files and datasets housing network shares, you probably won’t use them very often. If you’re particularly unlucky, you could end up with backups that appear to be functional at first glance, but become a corrupted mess when you try to restore their data.
You should restore backups every once in a while
You can opt for full restoration or choose files at random
If you’ve got some extra rigs (even spare system resources on your server nodes), I recommend deploying test environments where you can restore the backed-up files and confirm they’re in tip-top shape. Of course, you don’t need to grab additional hardware or look into server paraphernalia just to test your backups.
For example, I’ve got a Proxmox Backup Server node running at the heart of my home lab, and I’ve configured it to sync data with an external PBS system that’s stationed at my family’s home. When I get some spare time, I’d randomly pick a couple of LXCs or VMs and use the Restore button on my PVE nodes to redeploy them as new virtual guests. Then, I'd shut down the original instances (if they were still running) and boot the freshly-deployed clones to confirm whether everything works as expected. Virtual machines take slightly longer to restore, so I don’t test them all that often, while LXCs are way faster to spin up from snapshots. In fact, I don’t even need to go for a full restoration. PBS supports selective recovery, and I can just download random files from any ol’ snapshot to check whether silent corruption managed to sink its teeth into my virtual guests.
The process is largely the same for my NAS rigs, with the only differences being that they're powered by TrueNAS and that I don’t have enough storage provisions to recover massive datasets just to test their integrity. For datasets smaller than 500 GB, I can attempt a full copy operation, while selective restoration is my only option for massive pools and Rsync/cloud tasks. There’s no need to do this every week, or even on a monthly basis. Just a couple of restoration operations every now and then are more than enough to ensure my backups are healthy – especially since PBS and TrueNAS have other neat tricks to check the integrity of my snapshots and backup files.
Verification jobs are worth setting up
Especially when you don’t have enough system resources to test backups
Although I’d still advise testing backups with the conventional restoration method, you can also use the data protection options on most modern home server platforms to scan for corrupted backup files. Proxmox Backup Server, for instance, can create verification jobs, which check whether the VM and LXC snapshots can be restored to their original instances. In fact, I use these with my primary and secondary PBS rigs to ensure my snapshots remain intact.
TrueNAS, on the other hand, can schedule regular scrub tasks that utilize the checksum facility of the all-powerful ZFS file system. You see, TrueNAS creates a checksum for every block of data that’s written to my NAS. Scrub tasks calculate this value for each block before comparing it with the original checksum. Since even the slightest bit of corruption can change the checksum value, scrub tasks are a powerful way to scan for broken backups.
Most NAS platforms that support ZFS or Btrfs have some variation of this facility, and I recommend enabling it even if you manually test your snapshots and backups.
I self-host Syncthing to sync files between my PC, Mac, NAS, and other devices
Syncthing is a neat utility when you want multi-directional file synchronization for your home lab
