Inside PewDiePie’s $41,000 AI PC: 424GB of VRAM for Local LLMs
By Allan Witt | Updated: November 12, 2025
PewDiePie’s custom open-frame AI PC build showing 10 GPUs installed on the left and NVIDIA System Management Interface on the right listing eight RTX 4090 48GB cards and two RTX 4000 Ada 20GB cards, totaling 424GB of VRAM.
When one of YouTube’s biggest creators decides to build a personal AI supercomputer, the local LLM scene takes notice. PewDiePie’s journey into AI hardware has produced a multi-GPU, 424GB VRAM workstation that many enthusiasts dream of.
While his budget is far beyond the average builder, his component choices and setup offer a valuable blueprint for anyone serious about high-VRAM local inference. This article breaks down the hardware, cost, and technical design behind his roughly $41,000 desktop machine.
Before we total up the cost, here’s a breakdown of the major components that make up PewDiePie’s 424GB VRAM AI machine:
| Component | Model / Details | Qty | Approx. Price (USD) | Subtotal |
|---|---|---|---|---|
| GPUs | NVIDIA RTX 4090 (48GB modded versions) | 8 | $3,000 | $24,000 |
| NVIDIA RTX 4000 Ada Generation (20GB) | 2 | $1,500 | $3,000 | |
| CPU | AMD Threadripper PRO 7985WX (64-core, WRX90 platform) | 1 | $7,300 | $7,300 |
| Motherboard | ASUS Pro WS WRX90E-SAGE SE | 1 | $1,200 | $1,200 |
| Memory (RAM) | 512GB DDR5 ECC (8×64GB, 5600 MT/s) | 1 set | $4,800 | $4,800 |
| Power Supply Units | Seasonic PRIME TX-1300 (1300W, Titanium) | 2 | $450 | $900 |
| Estimated Total: | ≈ $41,000 USD | |||
A 424GB Multi-GPU Array
The single most important metric for running large models locally is VRAM, and this build delivers an enormous amount. The system is built around a total of ten GPUs, providing a combined 424 GB of video memory. This is achieved with a mix of eight modified NVIDIA RTX 4090s, each with 48 GB of VRAM, and two NVIDIA RTX 4000 Ada Generation cards with 20 GB each. This configuration is a significant upgrade from his initial setup, which used eight of the 20 GB RTX 4000 cards.
With 424 GB of VRAM, you can run extremely large models that are inaccessible to most users. For example, a 405-billion parameter model quantized to 4-bits (Q4) would require around 220-240 GB of VRAM, fitting comfortably on this machine with plenty of room for a large context window.
However, the mixed GPU configuration introduces a technical nuance. High-performance inference libraries like vLLM, which are excellent for agentic workflows and tool use, currently struggle to use tensor parallelism across GPUs with different VRAM amounts. This means you can’t split a single model layer across a 48GB card and a 20GB card to maximize speed. You can, however, use pipeline parallelism to assign different layers of the model to different GPUs, or simply use frameworks like llama.cpp and ExLlama which are more flexible in utilizing heterogeneous VRAM pools.
What Fits in 424 GB of VRAM
To put this enormous VRAM pool into perspective, here’s what kind of large-scale models can actually be loaded and run locally on PewDiePie’s build. The following table compares the estimated VRAM requirements (for model weights only) between two common inference frameworks: llama.cpp (Q4_K_XL quantization) and vLLM using AWQ quantization.
| Model | llama.cpp Q4_K_XL | vLLM AWQ |
|---|---|---|
| Qwen3 235B A22B | 134 GB | 124 GB |
| GLM-4.6 355B | 204 GB | 197 GB |
| Qwen3 Coder 408B | 276 GB | 262 GB |
| DeepSeek V3.1 671B | 387 GB | 362 GB |
All of these models fit comfortably within the 424 GB VRAM capacity, even leaving room for substantial context windows and multi-agent workloads.
The only frontier model that cannot be fully loaded in VRAM at 4-bit quantization is Kimi K2 1T, which requires around 587 GB in Q4_K_XL format. However, you can still load it in Q2_K_XL quantization, which brings the footprint down to approximately 382 GB, allowing it to run (albeit with slightly reduced precision) within the limits of this setup.
Keep in mind these figures only represent the base model weights — real-world usage will require additional VRAM for context expansion, system prompts, and runtime buffers.
Threadripper Pro and the WRX90 Chipset
To connect ten GPUs to a single system, you need an extreme number of PCIe lanes. The solution here is the AMD Threadripper Pro platform. The motherboard is the Asus Pro WS WRX90E-SAGE SE, a workstation-grade board that costs around $1,200. Its standout feature is the seven physical PCIe 5.0 x16 slots, making it a great motherboard for stacking 5 to 10 GPUs, perfect for powering massive multi-GPU setups used for 200B+ LLMs.
Driving this motherboard is likely a CPU like the AMD Threadripper PRO 7985WX, a processor that can cost upwards of $7,300. The CPU’s primary role in an LLM inference rig isn’t raw processing power but its ability to provide I/O. On the WRX90 chipset, this CPU family unlocks its full 128 usable PCIe 5.0 lanes, ensuring that multiple GPUs can communicate with the system without significant bottlenecks. This is a crucial detail, as the consumer-focused TRX50 chipset would limit the same CPU to fewer lanes.
512GB of High-Speed DDR5
The system is equipped with a staggering 512 GB of DDR5 system RAM. While the model weights reside in VRAM during inference, having a vast pool of system RAM is essential for loading models, managing large context histories, and maintaining system responsiveness. This was achieved using eight 64 GB modules to populate all eight memory channels supported by the Threadripper Pro platform.
For builders looking to replicate this, memory choice presents a cost-saving opportunity. Officially validated 6000 MT/s kits from brands like V-Color can run close to $5,000 for 512 GB. However, choosing a slightly slower but still very capable 5600 MT/s kit could save a significant amount of money without a major impact on inference performance for most use cases.
A Two-PSU Undervolting Strategy
Powering eight RTX 4090s and two RTX 4000s presents a major challenge. A stock RTX 4090 can draw 450 watts under full load. Powering eight of them would require over 3600 watts for the GPUs alone, necessitating three or even four power supplies. PewDiePie’s solution is both elegant and essential for a build like this: aggressive undervolting.
From his videos, we can see the modified 48GB RTX 4090s are running at around 175 watts under 100% load. This is a massive reduction from the stock 450 watts. By cutting the power draw of each card by more than half, the entire GPU array can be reliably powered by two high-quality 1300-watt power supplies, such as the Seasonic PRIME TX-1300 units seen in the build. This strategy not only solves the power delivery problem but also dramatically reduces heat output, a critical factor in a system with ten GPUs.
A $41,000 Price Tag
When you add up the core components, the cost is substantial. The eight 48GB 4090s alone come to around $24,000, with the two RTX 4000s adding another $3,000. Combined with the $7,300 CPU, $1,200 motherboard, nearly $5,000 in RAM, and $900 for power supplies, the total cost lands around the $41,000 mark, not including storage or cooling.
Unsurprisingly, a standard case cannot accommodate this hardware. The build is housed in a custom-built open-air mounting bracket. This is a practical necessity for multi-GPU systems, as it provides unrestricted airflow and the physical space needed to mount the cards and manage the complex cabling. While not for everyone, PewDiePie’s machine is a fascinating look at the upper limits of what is possible for a dedicated local LLM enthusiast today. It serves as a valuable data point on component selection and power management for anyone planning their own, perhaps more modest, multi-GPU build.
