Voozh

Most people are familiar with sharing files over a network using something like SMB or NFS. These protocols work well, but they're designed for transferring files, not for treating storage on another machine as if it were directly attached to your own. That's where protocols like iSCSI and more modern adaptations like NVMe-over-TCP come in.

NVMe-over-TCP is an extremely similar concept to iSCSI, except it's aimed specifically at NVMe drives and leveraging their high-speed capabilities. It lets you take an NVMe drive in one machine and expose it over a TCP/IP network so another machine can use it as though it were plugged in locally. NVMe SSDs are high-speed and low-latency, and NVMe-over-TCP leverages exactly that.

However, how good is it really? I decided to try it on my home network to see what kind of speeds I could get, and even though the results were pretty impressive, it's clear that this isn't something most people will ever need to set up.

What is NVMe-over-TCP?

iSCSI, but for NVMe drives

Source: Western Digital

NVMe-over-TCP is part of the broader NVMe over Fabrics (NVMe-oF) standard, which allows the NVMe protocol to be sent over different transports. The "fabrics" can be InfiniBand, Fibre Channel, RDMA over Converged Ethernet (RoCE), or, in this case, plain old TCP/IP.

The goal is to let one machine (the target) export an NVMe device over the network, while another machine (the initiator) connects and uses it as a block device. From the initiator's perspective, it looks and behaves like a local NVMe drive, even though all reads and writes are happening across the network.

With the connection happening over TCP, it means that you don't need to replace anything in your network to use it, and it also makes it incredibly cost-effective. Plus, it's low latency as well, and maintains a high level of throughput, just like you would expect from an NVMe SSD.

With that said, just because it's fast doesn't mean that it's useful for most people. You'll need really, really fast internet to saturate that kind of speed. We're talking at least a 2.5 GbE network, though 10 GbE and higher are really where you'll see gains that make it worth using. You see, it's aimed more at server clusters than enthusiasts, and while you could build a home network that could make use of it, it's going to cost you a lot of money (not to mention time) for little benefit.

For a home user, this is merely a "because I can" type of project. Most people don't need to do this, though it can be beneficial in certain scenarios, which we'll get into.

Setting up NVMe-over-TCP

Easy on most distros

In my home setup, the only device I had capable of designating a target NVMe device over the network was a Proxmox host, which has an NVMe SSD inserted as additional storage, and it's formatted as ZFS storage. Thankfully, you can actually create a dataset and export that using NVMe-over-TCP, which is exactly what I did. However, if you're just using a regular old Linux server, you don't need to deal with any of that.

The issue with using Proxmox for this particular setup is that the CLI tool needed to set the target, nvmetcli, isn't actually available in the standard repositories that Proxmox enables. Instead, I installed it manually following Nvidia's guide, and then used "modprobe nvmet-tcp" to load the required driver. From there, I could execute "nvmetcli" and begin creating my target device. If you're on another distro, it should be easy to install.

I ran the following commands in sequence once inside of nvmetcli. Note that this will be different for everyone's setup, so I provide this as a reference rather than a step-by-step guide:

create testsubsystem
cd testsubsystem
set attr allow_any_host=1 create namespaces
cd namespaces
create 1
cd 1
set device path=/dev/zvol/ZFSStorage/nvme_test
enable
cd /
cd ports
create 1
cd 1
set addr trtype=tcp traddr=192.168.1.110 trsvcid=4420 adrfam=ipv4 saveconfig
exit

That's all you need to do, and now we can connect to the drive remotely from an initiator device.

Connecting to an NVMe-over-TCP device and testing performance

It works on both Windows and Linux

If you're on Windows, StarWind NVMe-oF Initiator is the go-to tool to connect to your NVMe-over-TCP device for free. However, I opted to connect in the Windows Subsystem for Linux, and it was easier and also allowed me to benchmark the results. All you need to install on a Linux-based machine is nvme-cli, and you can connect with the following commands, replacing the address with your target machine's address.

sudo nvme discover -t tcp -a 192.168.1.110 -s 4420
sudo nvme connect -t tcp -n testsubsystem -a 192.168.1.110 -s 4420

Once the initiator connected to the exported NVMe device, it appeared just like any local /dev/nvme0n1 block device. That meant I could benchmark it directly using fio to see how much throughput I could get, and the results were good, though nothing groundbreaking.

To test performance, I ran a sequential read test with fio using 1 MiB block sizes and a queue depth of 1. Results were in line with what I expected, with a throughput of roughly 1.74 Gbps over my 2.5 GbE line. It's not up to the line, but close enough to show that the network is the bottleneck here, not the SSD itself. This is due to the ZFS overhead, block size and queue depth, and a small bit of protocol overhead.

Latency-wise, most operations completed in under 5ms, with occasional spikes higher. Compared to a local NVMe drive, where the same operation might take tens of microseconds, you can see how a network adds a little bit of overhead there, too. Even still, this is consistently better than most iSCSI HDD shares that you'll see.

By increasing the queue depth and running multiple jobs, it should be possible to get closer to 2.5 Gbps, but I'd still be capped by the physical link speed. On a 10 GbE network, NVMe-over-TCP could deliver gigabytes per second of throughput. This already surpasses most HDDs, but the real benefits are found at much higher speeds.

With all of that said, while it’s tempting to compare these numbers to SMB or NFS, that's not quite fair. Those protocols operate at the file level, while NVMe-over-TCP gives the client raw block access. This means lower latency for small random reads and writes, the ability to format the storage and use it as if it were a local device, and no file-sharing or permission overheads.

Most people don't need NVMe-over-TCP

It's fast, but not worth going through the effort for most people

NVMe-over-TCP is one of those technologies that's simply overkill for most home setups but fascinating to experiment with if you have the right hardware. In my case, the speeds were excellent given the 2.5 GbE limit, and the block-level access worked flawlessly, though I still wouldn't use it.

As for why I wouldn't use it, my NAS has four 4TB HDDs inside of it in a RAID 5 configuration, leaving me with 12TB of storage over my entire network. It handles network transfers just fine, and there comes a point where a small increase in speeds doesn't make a huge difference for local file transfers. I transfer large files so infrequently that I would need to uproot my entire storage system to benefit from these speeds, and that's just... not worth it.

If you’re just moving files between machines, SMB, NFS, or even plain old rsync is simpler and safer. But if you want to explore the same tech that data centers use to share blazing-fast storage, NVMe-over-TCP can be fun to play with. Especially when paired with a 10 GbE network, which should be capable of some truly crazy speeds.

URL: https://www.xda-developers.com/used-nvme-over-tcp-fast-most-people-dont-need/

⇱ I used NVMe-over-TCP to transfer files, and it's really fast... though most people probably don't need it