📚 Related: GPU Buying Guide · VRAM Requirements · Used RTX 3090 Buying Guide · Best Local LLMs for Mac (2026)
The $4,000 NVIDIA AI mini-PC isn’t $4,000 anymore. The DGX Spark Founders Edition jumped to $4,699 on February 23, 2026, an 18% hike NVIDIA attributed to memory-supply constraints (NVIDIA Developer Forums; Tom’s Hardware). Hardware unchanged. A lot of comparison posts still show the old price.
The bigger question this guide answers isn’t which GB10 box to buy — it’s whether you should buy into the 128GB-unified tier at all. For a lot of readers, the honest answer is no. Here’s the field check, the three numbers that actually decide it, and an honest used-GPU gut-check before you spend $3,000-5,000.
The Three Numbers That Decide
Before any product table: the local-AI hardware decision pivots on three numbers, in this order.
- How big a model do you actually need to fit? Anything 24GB can hold, a discrete consumer GPU wins on speed. Above 24GB at usable quants — 70B FP8, 122B-A10B, 200B at FP4 — you need unified memory, and that’s where the GB10 / Strix Halo / Mac Studio tier exists.
- How fast do you need it? Token generation is memory-bandwidth-bound. RTX 3090 has ~936 GB/s. Mac Studio M3 Ultra has ~819 GB/s. GB10 has ~273 GB/s. Strix Halo unified memory is around ~256 GB/s. Bigger memory pool, slower per-token throughput.
- What software stack do you need? CUDA (TensorRT-LLM, vLLM, PyTorch CUDA-native), Windows-with-AMD (Strix Halo gives you both), or macOS/MLX. None of these tiers is software-agnostic.
The reason “which box should I buy” has no single answer: those three numbers point different ways for different workloads. The rest of this guide pins them to specific products at June 2026 prices.
The Used-3090 Reality Check
Before sorting which 128GB-unified box, the question worth asking first: what does a used RTX 3090 actually do for $900-1,200?
Community-reported benches:
| Workload | Used RTX 3090 (24GB) | GB10 / Strix Halo / Mac (128GB unified) |
|---|---|---|
| Models that fit in 24GB (7B-32B dense, 35B-A3B MoE) | ~50 tok/s on 14B at 16K context (Hardware Corner) | ~30-60 tok/s on same models |
| Llama 70B Q5 / FP8 | Doesn’t fit; needs offload | ~8 tok/s (Strix Halo, community-reported), ~2-3 tok/s (GB10 FP8) |
| Qwen 3.5-122B-A10B (FP4/Q4) | Doesn’t fit | ~9.5 tok/s (Strix Halo, community-reported) |
| Memory bandwidth | ~936 GB/s | ~256-273 GB/s (GB10/Strix Halo) / ~819 GB/s (M3 Ultra) |
| Power draw | ~350W under load | ~100W (GB10), ~120W (Strix Halo), ~150W (M3 Ultra) |
| Needs host PC | Yes | No (standalone) |
The 3090 beats every mini-PC on speed for any model that fits in its 24 GB of VRAM, and gets crushed on anything that doesn’t fit. That’s it. That’s the whole trade. For coding agents, chat with 7B-32B, vision models that fit, image generation — the 3090 is faster and cheaper. For Llama 70B FP8, Qwen 3.5-122B-A10B, anything 100B+, the 3090 can’t load it at all.
If your actual workload sits inside 24 GB at usable quantization, the most honest GB10 advice is: don’t buy one. Buy a used 3090 and put the difference toward more memory in your host system or a second card. The 128GB-unified boxes are for workloads that don’t fit in 24GB. That’s the whole story.
If your workload genuinely does need the 128GB tier, the rest of this guide is for you: the GB10 field, the Strix Halo alternative, and where Mac Studio fits.
The GB10 Field, Current Pricing
Every GB10 box runs the same NVIDIA Grace Blackwell GB10 superchip — 20 ARM v9.2 cores (10 performance + 10 efficiency), Blackwell GPU with 6,144 CUDA cores and 192 5th-gen Tensor cores, 128 GB LPDDR5X unified memory at 273 GB/s, ~1 PFLOP sparse FP4 compute, 140W TDP, ConnectX-7 200GbE networking, DGX OS (Ubuntu-based; no Windows). The chip is the chip.
| Box | Source / SKU | Notable | Current US price |
|---|---|---|---|
| NVIDIA DGX Spark Founders Edition | NVIDIA direct (marketplace) | 4TB SSD, Gen 5 NVMe, reference design | $4,699 (up from $3,999) |
| ASUS Ascent GX10 | Amazon, 1TB SKU | Front power button; only GB10 with one | ~$3,088 (1TB) / ~$4,150 (4TB) (TechRadar, IT Pro) |
| Dell Pro Max (GB10) | Dell direct, 2TB SKU | Magnetic back panel, enterprise warranty | ~$3,699 (2TB) (IT Pro) |
| MSI EdgeXpert MS-C931 | MSI, 1TB SKU | Most labeled ports; plastic chassis | ~$2,999 (1TB direct) / ~$3,999 (Amazon 4TB) |
| Acer Veriton GN100 | Acer direct US | Storage Review reports best thermals of the field | $2,999 (Acer US direct) / £3,999 UK (TechRadar review) |
| Gigabyte AI TOP ATOM | Gigabyte direct | Confirmed in the OEM partner field | Varies by region |
| Lenovo | Announced, availability varies | OEM partner field member | Varies |
The performance differences between these boxes are negligible. They run the same SoC, hit the same software-imposed ~100W GPU power cap (the cap John Carmack noticed when reviewing the Spark — same cap is on every OEM build), and bench within margin on the same models. What you’re paying for is chassis quality, NVMe generation, support/warranty, and storage capacity.
The Spark Founders Edition is the only one with Gen 5 NVMe. That means ~13 GB/s sequential read vs ~7 GB/s on the Gen 4 SSDs in every OEM box, which translates to noticeably faster cold model loads. Per OEM reviews on the existing community testing, the gap is roughly a 25% reduction in cold-load time for the same model — meaningful if you swap models frequently, irrelevant if you load one model and run it all day.
The other differentiators (chassis materials, port labeling, thermal behavior under sustained load) come from the published OEM reviews — TechRadar on ASUS, IT Pro on Dell, Storage Review on Acer’s thermals. No GB10 OEM has been independently benched here. Treat the chassis differences as real but small; the performance difference is essentially zero.
The Strix Halo Field — AMD’s GB10 Challenger
This is what shifted in the months since the original DGX Spark launch: AMD’s Ryzen AI Max+ 395 (“Strix Halo”) shipped 128GB unified memory in mini-PCs at prices that undercut every GB10 box, with Windows 11 support that the GB10 lineup doesn’t have. The chip itself is 16 Zen 5 cores at up to 5.1 GHz, Radeon 8060S iGPU (40 compute units), and up to 96 GB of the 128 GB unified pool allocatable as VRAM via AMD’s Variable Graphics Memory.
The GB10’s CUDA premium is real — TensorRT-LLM, vLLM, and PyTorch CUDA-native paths just work on GB10 and don’t on Strix Halo. But for builders who don’t need the CUDA-specific tooling, Strix Halo is in the same memory-capacity category at much lower prices.
| Box | Notable | Current US price |
|---|---|---|
| GMKtec EVO-X2 | Reportedly the cheapest 128GB Strix Halo build | ~$1,499 (Amazon listing) |
| Corsair AI Workstation 300 | Mainstream brand; multiple SSD tiers | $2,699 (1TB) / $3,399 (4TB) (TechPowerUp coverage) |
| AMD Ryzen AI Halo Developer Platform | AMD’s reference build | $3,999 (Micro Center, Phoronix) |
| NextNuc AI395 | Best Buy distribution | $3,999 (Best Buy listing) |
| HP Z2 Mini G1a | Enterprise-warranted Strix Halo box; HP’s first AI-targeted Z Mini | Configuration-dependent; commonly cited with 128GB option |
A few clarifications worth knowing about Strix Halo:
- The HP Z2 Mini G1a is a Strix Halo box, not a GB10 box. Some 2026 comparison posts have miscategorized it. The HP Z Mini G1a uses AMD Ryzen AI Max+; HP does not ship a GB10 personal-AI box.
- Strix Halo memory bandwidth is in the ~256 GB/s range for the unified pool, similar to the GB10’s 273 GB/s and far below discrete-GPU bandwidth.
- Community-reported benches show Strix Halo around 8-10 tok/s on 70B-class models (llm-tracker.info, Level1Techs forum thread, strategyarena.io) — usable but not interactive, the same band as GB10 on the same models.
If you want CUDA, this isn’t an answer. If you want 128GB unified memory at a lower price with Windows native, this is the answer. The mid-2026 reality.
The Mac Studio Comparison
The third box in this category: Mac Studio with 96GB or 128GB unified memory (M3 Ultra and the newer M5 Pro/Max). Memory-capacity-equivalent to GB10 and Strix Halo, but with significantly higher memory bandwidth — the M3 Ultra ships at 819 GB/s, roughly 3× the GB10’s 273 GB/s. That bandwidth difference translates directly to token throughput on large models that fit in unified memory.
The trade is software stack. No CUDA. Inference goes through Apple’s MLX or llama.cpp Metal backend; that’s a different toolchain than Linux GB10 or Windows Strix Halo. For builders inside the Apple ecosystem (or willing to live with MLX / llama.cpp Metal), the M3 Ultra is the bandwidth king of the 128GB-class boxes. For builders who need CUDA workflows, it’s a non-starter.
Current Mac Studio M3 Ultra with 128GB unified runs roughly $5,000-6,000 — more than a Spark Founders, less than some loaded OEM GB10 configurations. The Best Local LLMs for Mac (2026) guide covers the Apple-side picks in detail.
Real-World Performance Caveats (Mid-2026)
The marketing line on GB10 is “200B parameters on your desk.” The real-world picture is more measured. A few patterns worth knowing before deploying any of these:
- Token generation is bandwidth-capped. At ~273 GB/s, a GB10 box generates Llama 70B FP8 at roughly 2-3 tok/s and Llama 70B FP4 (via TensorRT-LLM) at around 5 tok/s. That’s usable for batch processing and agent orchestration; it’s painful for interactive chat. Time-to-first-token on a 90B+ model can hit the multi-minute range.
- Software-imposed 100W power cap. Every GB10 box hits a ~100W GPU power cap, not the 240W power-supply rating. This is by design, not thermal — CPU clocks don’t drop when it engages. NVIDIA could lift it via firmware. Carmack flagged this early; later coverage clarified it’s a software cap, not throttling.
- Thermal headroom varies by chassis. Storage Review’s testing put the Acer Veriton GN100 as the coolest of the field at 76°C peak; the ASUS Ascent GX10 has the only reports of triggered thermal-slowdown events under sustained stress. None of the boxes throttle under normal LLM inference workloads.
- Early-buyer reliability reports. Community forums in the first weeks of GB10 shipping flagged sporadic WiFi reset issues, DGX OS update friction, and one-off thermal management oddities. NVIDIA has shipped firmware updates since (the CES 2026 software stack delivered up to 2.6× improvement on some workloads), so first-batch issues are largely background by mid-June. If you’re buying a GB10 box now, you’re buying into a more stable platform than the launch-month cohort.
- No Windows. Every GB10 box runs DGX OS (Ubuntu-based). For builders whose toolchain is Windows-native, that’s the deal-breaker that pushes the decision to Strix Halo regardless of price.
Decision Tree
Walk this from the top; first match wins.
- Your workload fits in 24GB at usable quant (anything 7B-32B dense, 35B-A3B MoE)? → Used RTX 3090 ($900-1,200). Faster than any mini-PC on these models. Don’t overspend.
- You need 70B-200B at usable quants AND CUDA-specific tooling (TensorRT-LLM, vLLM, PyTorch CUDA paths)? → GB10 box. Pick DGX Spark Founders ($4,699) for Gen 5 NVMe and the reference design, or one of the OEM twins (ASUS / Dell / MSI / Acer $2,999-$4,150) to save $500-1,700 with negligible performance loss.
- You need 70B-200B at usable quants AND Windows support OR a tighter budget? → Strix Halo (Ryzen AI Max+ 395). $1,499 (GMKtec) to $3,999 (AMD reference / NextNuc). 128GB unified, 96GB allocatable as VRAM, Windows 11. No CUDA.
- You’re in the Apple ecosystem AND have $5K+ to spend? → Mac Studio M3 Ultra 128GB. 3× the memory bandwidth of GB10 / Strix Halo, fastest on big-model token throughput in this category. No CUDA. See Best Local LLMs for Mac 2026.
- You want both speed AND capacity? → Dual used 3090s with a real host PC, ~$2,400 for the GPUs plus a system. Total VRAM 48GB, both at ~936 GB/s. Higher ceiling than any mini-PC on models that fit; see GPU Buying Guide for the build.
The Bottom Line
Three things changed in the four months since the original DGX Spark launch story stabilized:
- NVIDIA hiked the Founders Edition price 18% to $4,699 for memory-supply reasons. The hardware didn’t change; the value calculation did.
- The field widened. AMD’s Strix Halo ships the same 128GB unified-memory category at $1,499-$3,999 with Windows. That’s an alternative the original DGX Spark conversation didn’t have.
- The used-3090 reality check became the most important question. For anything that fits in 24GB, a discrete GPU at $900-1,200 outruns every box in this guide. The 128GB-unified tier is for workloads that genuinely don’t fit. If yours does, save the money.
If you’ve concluded you need the 128GB-unified tier:
- For CUDA workflows: DGX Spark Founders (Gen 5 NVMe, reference design, premium price) or MSI / Acer / ASUS at $2,999-$3,099 (1TB) for the most cost-effective entry. Performance differences within the GB10 field are negligible; pay for chassis, NVMe, and warranty preferences.
- For Windows or budget-conscious 128GB: Strix Halo (GMKtec, Corsair, NextNuc, AMD reference). $1,499 to $3,999.
- For Apple stack with bandwidth headroom: Mac Studio M3 Ultra 128GB, $5,000-6,000.
For everything else, the Used RTX 3090 guide and the Multi-GPU local AI guide are the better reads.
Related Guides
- GPU Buying Guide for Local AI
- VRAM Requirements for Local LLMs
- Used RTX 3090 Buying Guide
- Best Used GPUs for Local AI (2026)
- Multi-GPU Local AI
- What Can You Run on 24GB VRAM
- Best Local LLMs for Mac (2026)
- Running LLMs on Mac M-Series
- Apple M5 Pro/Max for Local AI
- llama.cpp vs Ollama vs vLLM (2026)
- Local AI Planning Tool — VRAM Calculator
Get notified when we publish new guides.
Subscribe — free, no spam