Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics

Written by Michael Larabel in Intel on 11 December 2025 at 05:29 AM EST. Add A Comment

Following yesterday's release of a new llm-scaler-omni beta there is now a new beta feature release of llm-scaler-vllm that provides the Intel-optimized version of vLLM within a Docker container that is set and ready to go for AI on modern Arc Graphics hardware. With today's llm-scaler-vllm 1.2 beta release there is support for a variety of additional large language models (LLMs) and other improvements.

Going the route of llm-scaler-vllm continues to be Intel's preferred choice for customers to leverage vLLM for AI workloads on their discrete graphics hardware. With this new llm-scaler-vllm 1.2 beta release there is support for new models and other enhancements to benefit the Intel vLLM experience:

- Fix 72-hour hang issue
- MoE-Int4 support for Qwen3-30B-A3B
- Bpe-Qwen tokenizer support
- Enable Qwen3-VL Dense/MoE models
- Enable Qwen3-Omni models
- MinerU 2.5 Support
- Enable whisper transcription models
- Fix minicpmv4.5 OOM issue and output error
- Enable ERNIE-4.5-vl models
- Enable Glyph based GLM-4.1V-9B-Base
- Attention kernel optimizations for decoding phases for all workloads (>10% e2e throughput on 10+ models with all in/out seq length)
- Gpt-oss 20B and 120B support in mxfp4 with optimized performance
- MoE models optimizations, output throughput:Qwen3-30B-A3B 2.6x e2e improvement; DeeSeek-V2-lite 1.5x improvement.
- New models: added 8 multi-modality models, image/video are supported.
- vLLM 0.10.2 with new features: P/D disaggregation(experimental), tooling, reasoning output, structured output,
- fp16/bf16 gemm optimizations for batch size 1-128. obvious improvement for small batch sizes.
- Bug fixes

This work will be especially important for next year's Crescent Island hardware release.

👁 Intel AI Software

More details on the new beta release via GitHub while the llm-scaler-vllm Docker container is available via the Docker Hub container image library.

Add A Comment

Intel Compute Runtime Now Advertises Early Support For Nova Lake, Introduces Experimental "LEO"

Intel Performance Skills: New Open-Source Project Leveraging AI For Linux Performance Optimizations

Intel Ending Development Of BigDL: An Open-Source AI/LLM Effort Getting Axed

Intel Thermald 2.5.12 Released... With Initial Support For ARM

Intel's Open Image Denoise 2.5 Delivers Solid Performance Improvements For GPUs

Intel XPU Manager 2.0 Overhauls Windows & Linux Management For Arc Pro GPUs

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Affected Packages

ReactOS "Open-Source Windows" Reaches The Milestone Of Being Able To Run Half-Life

macOS 27 Beta Breaks The Ability To Boot Asahi Linux

Arch Linux's AUR Sees More Than 400 Packages Compromised With Malware

Arch Linux AUR Hit By Another Wave Of Now More Sophisticated Malware Attack

Russian Spam & Profanities Are Now Plaguing The Arch Linux AUR

YSERVER: Modern X11 Server Written In Rust With The Help Of Claude Code

AMD Opens Pre-Orders For The Linux-Friendly Ryzen AI Halo Developer Platform

Support Phoronix
While Having Ad-Free Browsing,
Single-Page Article Viewing

Legal Disclaimer, Privacy Policy, Cookies | | Contact

URL: https://www.phoronix.com/news/Intel-llm-scaler-vllm-1.2-beta

⇱ Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics - Phoronix

Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics