Voozh

If you want to run larger local models like Qwen3 235B A22B or GLM-4.6 355B fully in VRAM, you quickly run into the problem of scale. Even with 4-bit quantization, Qwen3 235B A22B is about 135 GB and GLM-4.6 355B is roughly 206 GB. On budget-tier GPUs such as RTX 3090 (24 GB VRAM), that means 6 to 9 cards. Consumer motherboards simply can’t handle that. To make a practical, reliable setup, you need workstation or server-grade boards that can provide both PCIe bandwidth and CPU lane count to feed multiple GPUs.

Why Stack GPUs Instead of Offloading to RAM

Right now, RAM prices are climbing, and high-speed DDR5 is not cost-effective. A 3090’s VRAM bandwidth is about ten times faster than dual DDR5-6000 memory, and once you factor in the price difference, GPUs win on performance-per-dollar. For local inference, especially when you want everything in VRAM to avoid context swapping delays, stacking GPUs remains the most efficient solution. Offloading to system RAM slows inference and prompt processing to a crawl, especially with larger models and longer context lengths.

Running large models on a single GPU, or mixing RAM and VRAM, is fine for smaller models and patient users. But for users who want responsive prompt processing on big context windows or multi-agent setups, everything in VRAM is the only way to go. That’s where multi-GPU motherboards come in.

Understanding the Core Requirements

To run 5 to 10 GPUs, you need:

A CPU with enough PCIe lanes – Threadripper PRO or EPYC class chips are required.
PCIe 4.0 x8 or x16 slots for each GPU – PCIe x1 or x4 slots bottleneck prompt prefill speeds.
Motherboard with bifurcation support you accessed the number of PCIe slots.
Strong power delivery and spacing – consumer boards are not designed for six RTX xx90-class cards.
Proper cooling and chassis planning – rack or open-frame builds are a must.

Let’s go through three practical motherboard options that enthusiasts are using today.

Gigabyte MC62-G40 – The “Budget” Workstation Board

👁 gigabyte mc62 g40 motherboard with 7 gpu slots for llm

At around $600, the MC62-G40 is the best entry point into serious multi-GPU territory. It uses the WRX80 chipset and supports Threadripper PRO 3000/5000 WX series CPUs. A 3945WX can be found second-hand for $120 to $150 and already gives you 128 PCIe 4.0 lanes which is enough for 6 to 10 GPUs with x8 or x16 connectivity.

This board includes 7 PCIe 4.0 slots (5 × x16 + 2 × x8) and 8-channel DDR4 memory support. The lane configuration and chipset stability make it ideal for 3090 or A6000 stacking. Power delivery and BIOS stability are proven, and since it’s still DDR4, you save substantially on RAM cost compared to WRX90 systems.

ASRock WRX90 WS EVO – PCIe 5.0 Future-Proofing

👁 asrock wrx90 ws evo multigpu motherboard

If you want PCIe 5.0 bandwidth and are ready to invest in newer hardware, the ASRock WRX90 WS EVO is the clear upgrade path. It supports Threadripper PRO 9000/7000 WX CPUs, providing 128 PCIe 5.0 lanes and 8-channel DDR5 ECC memory.

You get 7 PCIe 5.0 x16 slots and dual 10 Gb LAN, but the platform cost rises fast. The 7960X non-PRO chip starts around $1200, while the 7945WX PRO goes beyond $1400. DDR5 RDIMMs are also expensive, so full 8-channel setups will easily double your budget compared to WRX80.

This board is best for users planning to run 4090s or next-gen RTX 5090s with PCIe 5.0 bandwidth for maximum prefill throughput.

ASRock ROMED8-2T – EPYC Power on a Compact ATX

👁 asrock romed8 2t multigpu motherboard

The ROMED8-2T is a favorite among DIY server builders because it delivers EPYC lane counts in an ATX form factor. It supports EPYC 7002 and 7003 processors, each offering 128 PCIe 4.0 lanes which is the same as Threadripper PRO but with more flexible bifurcation.

You get 7 PCIe 4.0 x16 slots and two OCuLink ports for x4 connections. With bifurcation, each x16 slot can be split into x8/x8 or x4/x4/x4/x4 modes, allowing up to 14 GPUs if you’re using risers. The board includes dual 10 GbE LAN and IPMI for remote management, which makes it perfect for open-rack or clustered GPU setups.

Paired with a 7232P EPYC (available used for under $100), it offers unmatched performance-per-dollar. Real-world builds have confirmed up to 13 GPUs recognized on stock BIOS, though you’ll need powered risers and careful airflow management.

Bifurcation, Risers, and Cooling

For multi-GPU builds, you’ll need bifurcation (enabled in the BIOS) riser cards or PCIe-MCIO adapters to make full use of the board’s lanes. Passive risers are fine for short x16 runs, but for more than three GPUs, use PCIe-MCIO risers. These reduce instability and protect both the board and GPUs from over-current.

👁 PCIe-MCIO bifurcation adapter that splits a single PCIe x16 slot into two x8 connections for multi-GPU or high-bandwidth expansion setups.

A PCIe-MCIO x16 to dual x8 bifurcation adapter, commonly used in multi-GPU LLM workstations to maximize PCIe lane utilization and enable additional GPU connections.

Most cards like the RTX 4090 or 3090 are physically too large to sit side-by-side, so spacing every other slot or mounting GPUs vertically is required. Rack-mount frames or open mining-style chassis are the practical options here.

Power and Thermal Planning

A 6 GPU build with RTX 3090s can pull 1800 to 2000 watts at full load. You’ll need multiple PSUs or a high-capacity server supply. Plan for proper circuit support, a single 15A line won’t cut it. For cooling, use blower-style GPUs or water blocks if you’re going for density.

Cost and Practical Limits

Even with used parts, these builds are expensive. A realistic 6 × 3090 setup with MC62-G40 and 3945WX can still exceed $4000 to $5000 before power or cooling. Going WRX90 or EPYC increases that, but you get higher memory bandwidth and better PCIe layout control.

If you only need partial offload or moderate context windows, a 4 GPU setup (96 GB VRAM with 3090s) can handle 70B to 130B models at 4-bit quantization with good speed. For 200B+ models, full VRAM fitting demands either 8 GPUs or workstation-grade RTX 6000 Ada cards.

Conclusion

Stacking GPUs is still the most cost-effective way to get fast local inference on large LLMs. For enthusiasts targeting Qwen3 235B or GLM 355B, the ROMED8-2T and MC62-G40 are proven, affordable options. The WRX90 WS EVO is the forward-looking PCIe 5.0 platform but at a steep premium.

If you’re building today, use PCIe 4.0 boards with enough x8/x16 slots, plan your power and cooling, and get GPUs with high VRAM and NVLink when possible. The cost adds up fast, but for those who want total local control of 200B+ parameter models, there’s still no better option than a custom multi-GPU workstation or rack system.

URL: https://www.hardware-corner.net/multi-gpu-llm-motherboards/

⇱ Building a Multi-GPU LLM Workstation: Choosing the Right Motherboard for 6 – 10 GPUs | Hardware Corner

Building a Multi-GPU LLM Workstation: Choosing the Right Motherboard for 6 – 10 GPUs