If you're running a NAS or considering building one, you've likely come across the term ZFS. On paper, it sounds like the perfect file system for a NAS. The file system promises reliability, self-healing, and better, smarter data management. Add deduplication to that, and it sounds like the perfect choice. Deduplication is a no-brainer on a NAS. Why waste space storing the same data twice when your NAS can spot duplicates on its own? It's effectively like unlocking free storage once you enable the setting.
But here's the thing. You enable deduplication and feel that you've built an efficient setup. Except, at some point, be it a few days or weeks later, you'll observe that your file transfers might start crawling or your NAS might be unresponsive. Dive into the activity log, and you'll spot that your memory usage has shot up for no reason. The reason? ZFS deduplication.
4 ZFS mistakes you only make once (and how to avoid them)
ZFS may offer a lot of benefits, but it can also ruin your NAS if you don't play by its rules
How ZFS deduplication really works
High memory use can slow everything down
ZFS deduplication sounds like a fairly simple concept on the surface, but there's a lot of complexity under the hood. The way it works is this. The moment you enable ZFS deduplication, the file system starts checking every single block of data before writing it to disk. It examines individual blocks, generates a hash, and checks a deduplication table to verify whether the same block exists. It's effectively an internal database of hashes generated for every single block. If the block already exists, ZFS skips writing it and just references the existing one. That's how you end up saving space.
So, what's the catch? That entire table lives in your system memory, not your drive. In fact, it's essential that the hash table remain in memory so ZFS can perform these comparisons in real time. Every new write requires another lookup in that table. The larger your dataset, the larger the table. Obviously, ZFS can't predict what data you'll write next, so it keeps the entire table in memory, ready to go. Understandably, your RAM usage balloons as a result.
It took me a while to understand what seems to be standard community knowledge. You need a minimum of 1GB of RAM per terabyte of unique data when using deduplication. That does sound like a big deal until you realize that your 4-bay NAS is chock full of multiple 18TB or larger hard drives, and you didn't factor in the cost of RAM. Once you start loading backups or media, you could easily fill up that storage and need tens of gigabytes of memory just to maintain the hash table. Don't forget that most NAS boxes come with a paltry 2–4GB of RAM out of the box.
So why do things start slowing down? Once you run out of RAM, ZFS moves that deduplication hash table to disk, and things really start to fall apart. When the dedup table gets moved to disk, every read and write slows down dramatically. Disk lookups are incredibly slow compared to reading from memory, and this cascades into file transfers and even basic file operations. It's not that deduplication doesn't work; the memory overhead outweighs the gain for most home setups. It's a performance tax that's just not worth it for most average home users.
Where ZFS deduplication makes sense
Compression might be the smarter choice for most
So, where exactly is ZFS useful? While prosumers are well-versed in the pros and cons of using ZFS, home users don't realize that deduplication only helps when identical data blocks are stored multiple times. On a typical home or small office NAS that's full of media, photos, and documents, that's almost never the case. The data is all unique at a block level. When the underlying data is entirely unique, ZFS ends up doing all that heavy lifting for hash table comparisons for nothing.
Deduplication shines in environments where data is being replicated. Think virtual machine storage, databases, or enterprise backup systems, where the same data is copied and written over and over. Predictably, ZFS can save you a lot of storage space in that very specific scenario. But for most personal users, the percentage of duplicated data is low enough that the return on investment in getting more RAM is inconsequential.
You could, instead, consider ZFS compression. ZFS compression can give you solid space savings without the memory penalties by compressing data before writing it to disk, and decompressing it on the fly when you need it. Moreover, it doesn't significantly increase CPU load, making it usable even on slower NAS boxes. All of that with significant savings that can be as much as 30 percent, depending on the data type.
Sometimes enterprise tools really don't belong in a home lab
Look, ZFS deduplication isn't a bad feature. It obviously has a place in the right environment. But that environment is mostly enterprise and large businesses. You might be tempted to run ZFS in a home stack because it's the pro-thing to do, but if you don't have the data that could benefit, deduplication puts a needless toll on your system for little to no benefit.
