VOOZH about

URL: https://www.hardware-corner.net/intel-arc-pro-b65-b70-llm-bandwidth/

⇱ Faster LLM Inference from Intel: Arc Pro B65 and B70 Raise the Memory Bandwidth Bar


Faster LLM Inference from Intel: Arc Pro B65 and B70 Raise the Memory Bandwidth Bar

Chavy Levi Jan 26, 2026 at 11:57am PDT
💬 0 Comments
👁 Image

Intel is preparing two new Arc Pro Battlemage GPUs, the Arc Pro B65 and Arc Pro B70, and for local LLM users the key change is not compute but memory configuration. Both cards move to a 256-bit GDDR6 memory bus paired with 32GB of VRAM, a combination that directly targets one of the main bottlenecks in local inference: memory bandwidth.

With a 256-bit bus and modern GDDR6, these cards are expected to reach around 576 GB/s of memory bandwidth. For transformer inference, especially on larger quantized models, this matters more than raw core count. Token generation speed on memory-bound workloads often scales almost linearly with bandwidth once the model no longer fits comfortably in cache.

Why the 256-bit bus matters for LLM workloads

The previous Arc Pro B60 generation was attractive mainly because of VRAM capacity, not speed. The single B60 shipped with 24GB on a 192-bit bus, while the B60 Dual combined two GPUs on one board, each still limited to its own 192-bit interface. Importantly, the memory buses on the dual card do not add together. Each GPU remains capped by its own bandwidth ceiling.

👁 arc b60 dual 48bg pcb and cooling

Intel Arc B60 with Dual GPU design. Image: GamersNexus

For local LLM inference, this means the B60 Dual is excellent for loading very large models but less impressive when it comes to tokens per second, especially compared to cards with wider buses. The new B65 and B70 address this directly by widening the bus to 256-bit while also increasing VRAM to 32GB per GPU.

VRAM capacity and practical model sizing

Moving from 24GB to 32GB of VRAM changes what can be run comfortably on a single GPU. A 32GB card can handle 70B class models at 4-bit with more headroom for longer context, KV cache growth, and higher batch sizes. It also reduces the need for aggressive offloading or multi-GPU sharding for many popular local setups.

Compared to the older lineup, Intel is clearly prioritizing a more balanced configuration. The B60 Dual still wins on raw capacity at 48GB total, but it does so by pairing two slower memory subsystems. The B65 and B70 instead aim for fewer compromises per GPU.

Bandwidth and memory comparison for local LLM users

GPU Model VRAM Memory Bus Bandwidth
Arc Pro B70 32GB GDDR6 256-bit ~576 GB/s
Arc Pro B65 32GB GDDR6 256-bit ~576 GB/s
Arc Pro B60 24GB GDDR6 192-bit ~456 GB/s
Arc Pro B60 Dual 2x 24GB GDDR6 192-bit per GPU ~456 GB/s per GPU

The table highlights the core point for inference workloads. The dual card does not double bandwidth, only capacity. The new Battlemage-based Pro cards are the first Intel Arc options that combine higher-than-24GB VRAM with a meaningfully wider bus.

Availability and pricing context

There is currently no pricing information for the Arc Pro B65 or B70. The existing Arc Pro B60 Dual has only just started shipping to regular consumers rather than enterprise buyers, with UK pricing landing around £1400. Until Intel and its partners announce official prices, it is too early to judge performance per pound. For now, the technical direction is clear: wider buses and 32GB VRAM are finally coming to Intel Arc, and for local LLM enthusiasts, that is the most relevant upgrade in this generation.

👁 Google
Set as Preferred Source

No comments yet.