VOOZH about

URL: https://www.hardware-corner.net/black-friday-llm-gpu-discounts/

⇱ Best Black Friday 2025 GPU Deals for Local LLM Users | Hardware Corner


Best Black Friday 2025 GPU Deals for Local LLM Users

By Allan Witt | Updated: November 22, 2025

πŸ‘ llm capable gpus on discount on black friday

*Affiliate Disclaimer: Some links provided above are affiliate links. If you make a purchase through these links, We may earn a small commission, which helps support the continuation of our hardware testing.

Black Friday 2025 arrives on November 28, and for anyone running local LLMs, the next few weeks are worth watching closely. GPU pricing hasn’t seen major drops this yearβ€”high-end cards are still expensive, and most mid-range 16GB models sit close to their MSRP. Even so, Black Friday and the follow-up Cyber Monday period often bring short-term discounts that can make a real difference for users who need more VRAM without overspending.

For local inference, memory capacity is still the main limiter. A solid 16GB card can handle 7B–14B models comfortably, while 24GB and above opens the door to 30B-plus models without relying on heavy offloading. Since prices have been fairly static, LLM users are keeping a close eye on retailers as Black Friday approaches, hoping for meaningful deals on both consumer and workstation GPUs.

Black Friday 2025 Deals on LLM-Capable GPUs

We’re tracking GPUs that make sense for LLM workloads and monitoring their prices now through Black Friday 2025, and we’re grouping them by VRAM since memory capacity determines which models and context lengths they can run, with bandwidth playing a major role in real-world throughput.

We’re also setting the lower bar at 16GB, because for anyone buying a new GPU specifically for local LLM inference, going below 16GB simply doesn’t make sense. Smaller cards limit model size, reduce usable context length, and force offloading, which removes most of the performance gains you’d get from upgrading in the first place.

Best Value GPU Models

Through our GPU testing for prompt generation and prompt processing we found a handful of cards that repeatedly delivered the best performance-per-dollar for real LLM work. These GPUs hit the sweet spot between memory, bandwidth and raw throughput β€” meaning they handle common model sizes (7B–30B and many 32B variants) with generous context windows while not forcing you to pay workstation prices. In this section we list those best-value models and the Black Friday deals we found for them:

Deals for GPUs with 16GB VRAM

This is our lowest tier, since it doesn’t make sense to buy a new GPU with less than 16GB for LLM workloads. With 16GB of VRAM you can run models up to around 20B in 4-bit, and the usable context length depends on the model size. Our VRAM context length tests shows, that to fill 16GB at 4-bit quantization, a 7B model can reach roughly 70k context, a 14B model stays closer to 45k, and a 20B model such as gpt-oss can push up to about 131k. Bandwidth also matters, and while newer GPUs usually provide more throughput, they aren’t always the best value because their street prices are higher.

ZOTAC Gaming GeForce RTX 5060 Ti 16GB Twin Edge OC

A strong value option for local LLM work, offering enough VRAM to run 30B+ 4-bit models with comfortable context ranges.
LLM performance (@ 16K context): 8B – TG: 51.41 t/s; PP: 1447.92 t/s | 20B – TG: 80.42 t/s; PP: 1737.71 t/s
Specs: VRAM 16GB GDDR7 | 448.0 GB/s Bandwidth | 128-bit Bus | 180W TDP
Deals: Amazon $429 (-10%) | Newegg $439 | eBay (Used) ~$390

Asus PRIME GeForce RTX 5070 Ti 16GB

Users praise its quiet operation, strong efficiency, and solid performance at or near MSRP β€” a dependable 50-series option for LLM workloads.
LLM performance (@ 16K context): 8B – TG: 87.54 t/s; PP: 3653.77 t/s | 20B – TG: 133.05 t/s; PP: 4940.71 t/s
Specs: VRAM 16GB GDDR7 | 896.0 GB/s Bandwidth | 256-bit Bus | 300W TDP
Deals: Amazon $749.99 | Newegg $749.99 | eBay (Used) ~$718.21

PNY OC GeForce RTX 5080 16GB

A quiet, cool-running 16GB GPU with strong performance at MSRP, though the generational uplift is small compared to the 40-series.
LLM performance (@ 16K context): 8B – TG: 94.14 t/s; PP: 4024.38 t/s | 20B – TG: 140.48 t/s; PP: 4932.33 t/s
Specs: VRAM 16GB GDDR7 | 960.0 GB/s Bandwidth | 256-bit Bus | 360W TDP
Deals: Amazon $1111.02 | Walmart $989.00 | Newegg $999.99

Deals for GPUs with 24GB VRAM

Stepping up to 24GB of VRAM hits the real sweet spot for local LLM work. They will lett you comfortably run 8B, 14B, 30B MoE, and full 32B models, with typical context lengths around 131k, 86k, 65k, and 16k respectively, plus 20B models like gpt-oss reaching 131k, making this tier the most practical blend of flexibility, model size headroom, and future-proofing for serious hobbyists.

EVGA GeForce RTX 3090 FTW3 Ultra Gaming, 24GB

Still a powerful 24GB workstation-class GPU for LLM tasks, offering strong multi-model capacity and high bandwidth for long-context inference.
LLM performance (@ 16K context): 8B – TG: 87.45 t/s; PP: 2572.49 t/s | 20B – TG: 128.51 t/s; PP: 3243.60 t/s
Specs: VRAM 24GB GDDR6X | 936.2 GB/s Bandwidth | 384-bit Bus | 350W TDP
Deals: Amazon $929.97 | eBay (Used) ~$772.42

NVIDIA Founders Edition GeForce RTX 3090 Ti 24GB

A high-end 24GB GPU with excellent bandwidth and top-tier 30-series performance, still very capable for 20–40B LLMs and long-context work.
LLM performance (@ 16K context): 8B – TG: 93.60 t/s; PP: 2834.15 t/s | 20B – TG: 137.55 t/s; PP: 3440.06 t/s
Specs: VRAM 24GB GDDR6X | 1.01 TB/s Bandwidth | 384-bit Bus | 450W TDP
Deals: Amazon $1668.96 | eBay (Used) ~$837.54

Deals for GPUs with 32GB+ VRAM

This tier covers everything from high-end consumer GPUs to workstation-class models with 32GB, 48GB, and even 96GB of VRAM, enabling smooth local inference across the full range of models, from small 7B architectures all the way up to massive 123B-scale giants like Mistral Large, without overwhelming you with a cluttered breakdown of comparatively few available products.

Asus TUF GAMING GeForce RTX 5090 32GB

High-end 32GB GPU with extreme bandwidth and top-tier performance for 20B+ LLMs, supporting long context windows with ease.
LLM performance (@ 16K context): 8B – TG: 145.34 t/s; PP: 7587.74 t/s | 20B – TG: 249.19 t/s; PP: 7478.71 t/s
Specs: VRAM 32GB GDDR7 | 1.79 TB/s Bandwidth | 512-bit Bus | 575W TDP
Deals: Best Buy $1999.99 | Newegg $1999.99 | Amazon $2848.00 | eBay (Used) $2,381.68

PNY RTX 6000 Ada Generation 48GB

48GB GPU suitable for large LLMs up to 70B+, balancing high VRAM with solid bandwidth for extended context inference.
LLM performance (@ 16K context): 8B – TG: 98.68 t/s; PP: 4096.35 t/s | 20B – TG: 137.10 t/s; PP: 5350.48 t/s
Specs: VRAM 48GB GDDR7 | 960.0 GB/s Bandwidth | 384-bit Bus | 300W TDP
Deals: Amazon $6899.00 | Newegg $7267.99 | eBay (Used) $4,667.41

PNY NVIDIA RTX Pro 6000 Workstation 96GB

Massive 96GB GPU for the largest LLMs, supporting 70B–120B models with extremely high memory capacity and bandwidth for professional inference workloads.
LLM performance (@ 16K context): 8B – TG: 140.62 t/s; PP: 7587.74 t/s | 20B – TG: 237.92 t/s; PP: 7478.71 t/s
Specs: VRAM 96GB GDDR7 | 1.79 TB/s Bandwidth | 512-bit Bus | 600W TDP
Deals: Amazon $8,499 | Center Computer $7,989.99 | eBay (Used) $7,920.41

GPU Benchmarks

We benchmarked every GPU featured in this Black Friday guide for both prompt processing and token generation performance. The chart below shows results using an 8B model at a 16k context length, giving you a clear, comparable view of how each card performs under a realistic LLM workload. This should help you make a more informed decision when weighing price against performance.If you want deeper, model-by-model analysis, we also offer a full dedicated GPU ranking guide with expanded benchmarks and detailed comparisons.

πŸ‘ a graph showing the gpu llm performance with 8b model at 16k context. the gpus are spit into vram sections and show the performance for both prompt processing and token genration. in token generation the winner is rtx 5090 and in promp processinn rtx pro 6000

Read more: Run LLMs Locally