NVIDIA

NVIDIA H800 80GB

Hopper DatacenterDatacenterHopperSXMCUDA

Operating mode

Choose the operating mode for this hardware

Use this to bias workload recommendations toward responsiveness, background autonomy, lighter serving, or multi-GPU scale-out.

Current mode

Balanced

Balanced for general local use. Keeps the ranking neutral across personal and serving workflows.

See Full AI Tier List for NVIDIA H800 80GB →

About this GPU for AI

The NVIDIA H800 is a China-export-compliant variant of the H100, retaining the full Hopper compute capability — 80 GB HBM3 and Transformer Engine with FP8 — but with NVLink bandwidth cut to approximately 400 GB/s (down from H100's 900 GB/s) and FP64 performance capped at 1 TFLOPS. For single-GPU LLM inference, H800 performance is essentially identical to H100 SXM, making it highly effective for serving 70B models at FP16. The reduced NVLink bandwidth imposes a penalty for multi-GPU tensor parallelism in large training runs, which is why it was designed to be compliant. Like the A800, it was later banned under October 2023 export controls.

Beyond LLMs

AI Capability Matrix

What AI tasks this GPU can handle — from text generation to image and video creation.

Capability	Status	Representative Model	Detail
LLM Chat (7B)	Runs natively	Llama 3.1 8B Q4	—
LLM Coding (30B)	Runs natively	Qwen 3 30B Q4	—
LLM Large (70B)

hbm-memorymassive-vramexport-regulatedhigh-bandwidth

Specifications

Compute

FP16900 TFLOPS

INT81800 TOPS

ArchitectureHopper

Memory

VRAM80 GB

Bandwidth3000 GB/s

General

FamilyHopper Datacenter

SegmentDatacenter

InterconnectSXM

Compute PlatformCUDA

MSRP$30,000

Key Features

80 GB HBM3 — 3,000 GB/s bandwidth (near H100 levels)900 TFLOPS FP16 with sparsity / 1,800 INT8 TOPSFP8 Transformer Engine — comparable single-GPU inference to H100Reduced NVLink: ~400 GB/s (vs. H100's 900 GB/s) to meet export thresholdsFP64 capped at 1 TFLOPS (from 60 TFLOPS on H100)SXM form factor, 700W TDP

For AI Workloads

Strengths

Single-GPU inference performance matches H100 SXM — FP8 Transformer Engine fully enabled
3 TB/s HBM3 bandwidth delivers fast token generation for large models
80 GB allows 70B models at FP16 on a single card
Widely used in deployed Chinese AI inference infrastructure

Considerations

Reduced NVLink (~400 GB/s) degrades multi-GPU scaling efficiency for large training runs
Subject to export controls — no longer legally exportable under Oct 2023 BIS rules
High cost and niche availability outside China-focused supply chains
Now effectively superseded in Chinese AI infrastructure by H20 (higher VRAM) and domestic alternatives

Architecture

Hopper

Hopper is NVIDIA's datacenter-focused architecture succeeding Ampere. Built on TSMC 4N, it introduces the Transformer Engine with automatic FP8/FP16 mixed-precision training, HBM3/HBM3e memory, and NVLink 4.0 for multi-GPU scaling. The H100 flagship delivers up to 3x the AI training performance of A100.

AI Relevance

The Transformer Engine automatically manages FP8 precision for optimal training speed without accuracy loss. With up to 141 GB HBM3e (H200), Hopper GPUs can hold the largest open-weight models entirely in GPU memory, making them the workhorse of AI datacenters.

Process: TSMC 4NPlatform: CUDATensor Cores: Gen 4Precisions: FP64, FP32, TF32, FP16, BF16, FP8, INT8

Recommendations by Workload

Chat

Qwen 3 32B

Qwen 3 32B matches Chat and keeps a practical fit profile. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.

Decode 84.9 tok/s · 131K ctx · llama.cppEST.

45.1 GB / 80.0 GB VRAM

Coding

Qwen3-Coder-Next

Qwen3-Coder-Next is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.

Decode 164.1 tok/s · 244K ctx · llama.cppEST.

59.2 GB / 80.0 GB VRAM

Agentic Coding

Full Model Compatibility

👁 Alibaba
Qwen3-Coder-Next

S97

80B59.2 GB164 tok/s244K ctx

moe

👁 Alibaba
Qwen 2.5 VL 72B

S96

72B57.7 GB60 tok/s33K ctx

dense

👁 Alibaba
Qwen 3.6 35B A3B

S93

Just out of reach

Models you could run with an upgrade

High-quality models that need a bit more memory

👁 Alibaba
Qwen 3.5 397B A17B

397BTier 100Needs ~252.5 GB

Runs on AMD Instinct MI350X 288GB (~$8,000)

Also runs on 4× your GPU via NVLink — 116 tok/s

👁 Moonshot AI
Kimi K2.5

1000BTier 100Needs ~622.6 GB

Also runs on 8× your GPU via NVLink — 72 tok/s

👁 Moonshot AI
Kimi K2.6

1000BTier 100Needs ~622.6 GB

Also runs on 8× your GPU via NVLink — 72 tok/s

👁 DeepSeek
DeepSeek V4 Pro

1600BTier 100Needs ~871.8 GB

👁 DeepSeek
DeepSeek V4 Flash

284BTier 98Needs ~167.6 GB

Runs on AMD Instinct MI350X 288GB (~$8,000)

Also runs on 4× your GPU via NVLink — 184 tok/s

Image & Video Generation

Diffusion Model Compatibility

51 of 52 models can generate images or video on your NVIDIA H800 80GB

Model	Max Resolution	Gen Time	Grade
SD TurboImage	512×512	0ms	S
Stable Diffusion 1.5Image	512×768	100ms	S
Realistic Vision v5.1Image	512×768	100ms	S
DreamShaper 8Image	512×768	100ms	S
LCM DreamShaper v7