Running AI models locally provides enhanced privacy, reduced latency, and complete control over your infrastructure. In this guide, we’ll walk through setting up DeepSeek models (like DeepSeek-R1) inside a Proxmox virtual machine with GPU passthrough. This configuration allows the AI model to access your NVIDIA GPU directly for significantly improved performance.
In this tutorial you will learn:
How to configure IOMMU and GPU passthrough on your Proxmox host
How to create a properly configured VM for GPU computing
How to install NVIDIA drivers in the VM
How to install and run Ollama with DeepSeek models
How to choose between different DeepSeek model sizes based on your hardware
Software Requirements and Linux Command Line Conventions
Category
Requirements, Conventions or Software Version Used
System
Proxmox VE 7.x or higher with a supported CPU that has IOMMU capabilities
Software
Ollama, NVIDIA GPU driver, Debian/Ubuntu Linux for VM
Other
NVIDIA GPU (tested with RTX series)
Conventions
# – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command $ – requires given linux commands to be executed as a regular non-privileged user
Setting Up GPU Passthrough on Proxmox
Before running DeepSeek-R1 on Ollama, we need to properly configure GPU passthrough so that our virtual machine can directly access the GPU. This involves enabling IOMMU, configuring the system to use VFIO drivers, and setting up the VM correctly.
Enable IOMMU and configure GRUB: First, we need to enable IOMMU support on the host system
# nano /etc/default/grub
Find the line with GRUB_CMDLINE_LINUX_DEFAULT and add the IOMMU parameters:
In this example, the vendor and device IDs are `10de:2206` (for the GPU) and `10de:1aef` (for its audio component). You need both for proper passthrough.
Use the actual vendor:device IDs from your system as shown in the lspci output. After running update-initramfs, the changes will be applied on the next boot.
Method 2: Manual binding (if automatic method doesn’t work):
If the automatic method doesn’t work after rebooting, you can manually bind the GPU to VFIO drivers using a systemd service:
First, create a script that will handle the binding process:
# nano /usr/local/bin/vfio-bind-gpu.sh
Add the following content:
#!/bin/bash
# The full PCI address needs the domain part "0000:" added before the addresses from lspci
# For example, if lspci shows "01:00.0", use "0000:01:00.0" here
GPU_IDS="0000:01:00.0 0000:01:00.1"
modprobe vfio-pci
for dev in $GPU_IDS; do
if [ -e /sys/bus/pci/devices/$dev/driver/unbind ]; then
echo -n "$dev" > /sys/bus/pci/devices/$dev/driver/unbind
fi
if [ -e /sys/bus/pci/drivers/vfio-pci/bind ]; then
echo -n "$dev" > /sys/bus/pci/drivers/vfio-pci/bind
fi
done
Important: The lspci command shows addresses like “01:00.0”, but the sysfs filesystem requires the domain prefix “0000:”. Always add “0000:” before each PCI address from lspci when using them in this script.
Now that we have GPU passthrough set up on the host, we need to create a VM that’s properly configured to use the passed-through GPU. We’ll use the Proxmox web interface for this process.
Access the Proxmox web interface: Open your web browser and navigate to your Proxmox host (https://your-proxmox-ip:8006) and log in with your credentials.
Create a new VM: In the Proxmox web interface:
Select your Proxmox node in the server view (left panel)
Click the “Create VM” button at the top right
In the “General” tab:
VM ID: Choose a unique ID (e.g., 9100)
Name: Enter a descriptive name (e.g., “ollama-vm”)
Click Next
Configure OS settings: In the “OS” tab:
Select “Use CD/DVD disc image file (iso)”
Storage: Choose your ISO storage (e.g., “local-disks”)
ISO Image: Select your Linux distribution ISO (e.g., “debian-12.9.0-amd64-netinst.iso”)
Type: Linux
Version: 6.x – 2.6 Kernel
Click Next
Configure system settings: In the “System” tab:
Graphics card: Set to “Default”
Machine: Select “q35” (this is crucial for PCI passthrough)
BIOS: Select “OVMF (UEFI)”
Add EFI Disk: Check this option
EFI Storage: Select your ZFS storage pool (e.g., “zfs_raid1_storage”)
Pre-Enrolled Keys: Uncheck this option (disables secure boot)
Click Next
Configure disk: In the “Disks” tab:
Storage: Select your ZFS storage pool (e.g., “zfs_raid1_storage”)
Disk size: Set to “100” GB
Format: qcow2
Click Next
Configure CPU: In the “CPU” tab:
Sockets: 1
Cores: 8 (adjust based on your available resources)
Type: host (for best performance)
Click Next
Configure memory: In the “Memory” tab:
Memory: 32768 MB (32GB) for 7B parameter models, or 65536 MB (64GB) for 14B parameter models
Click Next
Configure network: In the “Network” tab:
Bridge: vmbr0
Firewall: Checked (if you want firewall protection)
Click Next
Confirm settings: Review your settings and click “Finish” to create the VM.
Add the GPU to the VM: After creating the VM:
Select your newly created VM in the left panel
Go to the “Hardware” tab
Click “Add” → “PCI Device”
Select your NVIDIA GPU from the dropdown
Check “All Functions” (to include both GPU and audio components)
Check “PCI-Express” (required for modern GPUs)
Click “Add”
Verify hardware configuration: Your hardware tab should now show:
Memory: Your allocated memory (e.g., 32 or 64 GiB)
Installing and Running Ollama with DeepSeek Models
Now that our VM can access the GPU, we can install Ollama and run DeepSeek models.
Install Ollama: Install Ollama on your VM
$ curl -fsSL https://ollama.com/install.sh | sh
Verify the installation:
$ ollama --version
Choose and run a DeepSeek model: Download and run your preferred model
For the 7B parameter model (faster, less resource-intensive):
$ ollama pull deepseek-r1:7b
$ ollama run deepseek-r1:7b
For the 14B parameter model (higher quality, more resource-intensive):
$ ollama pull deepseek-r1:14b
$ ollama run deepseek-r1:14b
This will download the model (which will take some time depending on your internet connection) and then start an interactive chat session. The 7B model offers a good balance of speed and quality, while the 14B model provides better responses but requires more GPU memory and runs slower.
Monitor GPU usage: You can monitor GPU usage during inference
$ nvidia-smi -l 1
This will show GPU usage, memory consumption, and temperature in real-time, refreshing every second. This can help you determine if your hardware is sufficient for your chosen model size.
GPU not detected in VM: If your GPU isn’t being detected
$ lspci | grep NVIDIA
If you don’t see your GPU listed, check:
1. IOMMU group isolation
2. Whether the GPU was properly unbound from the host
3. VM configuration (should be q35 machine type with UEFI)
NVIDIA driver issues: If the driver doesn’t load properly
$ dmesg | grep -i nvidia
Look for any error messages. Common issues include:
1. Secure boot interference (ensure it’s disabled)
2. Incompatible driver version
3. GPU not fully passed through (both video and audio functions need passthrough)
Conclusion
Setting up DeepSeek models with GPU passthrough in Proxmox allows you to run powerful AI models locally with optimal performance. This configuration provides better response times, privacy, and control over your AI infrastructure. While the initial setup process may seem complex, the benefits of having a locally-running LLM with full GPU acceleration are substantial.
By following this guide, you’ve created a dedicated AI virtual machine that can be backed up, cloned, and migrated just like any other Proxmox VM, while still maintaining high-performance GPU access. The flexibility to choose between different model sizes (7B or 14B) lets you balance performance and quality based on your hardware capabilities and needs.
Our testing shows that on a system with an RTX 3080 GPU, the 7B model provides snappy responses with minimal latency, while the 14B model offers higher quality outputs at the cost of slightly slower response times. You can now explore other models supported by Ollama or customize DeepSeek for your specific use cases.