VOOZH about

URL: https://www.phoronix.com/news/Intel-llm-scaler-vllm-1.2-beta

⇱ Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics - Phoronix


👁 Phoronix

Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics

Written by Michael Larabel in Intel on 11 December 2025 at 05:29 AM EST. Add A Comment
Following yesterday's release of a new llm-scaler-omni beta there is now a new beta feature release of llm-scaler-vllm that provides the Intel-optimized version of vLLM within a Docker container that is set and ready to go for AI on modern Arc Graphics hardware. With today's llm-scaler-vllm 1.2 beta release there is support for a variety of additional large language models (LLMs) and other improvements.

Going the route of llm-scaler-vllm continues to be Intel's preferred choice for customers to leverage vLLM for AI workloads on their discrete graphics hardware. With this new llm-scaler-vllm 1.2 beta release there is support for new models and other enhancements to benefit the Intel vLLM experience:
- Fix 72-hour hang issue
- MoE-Int4 support for Qwen3-30B-A3B
- Bpe-Qwen tokenizer support
- Enable Qwen3-VL Dense/MoE models
- Enable Qwen3-Omni models
- MinerU 2.5 Support
- Enable whisper transcription models
- Fix minicpmv4.5 OOM issue and output error
- Enable ERNIE-4.5-vl models
- Enable Glyph based GLM-4.1V-9B-Base
- Attention kernel optimizations for decoding phases for all workloads (>10% e2e throughput on 10+ models with all in/out seq length)
- Gpt-oss 20B and 120B support in mxfp4 with optimized performance
- MoE models optimizations, output throughput:Qwen3-30B-A3B 2.6x e2e improvement; DeeSeek-V2-lite 1.5x improvement.
- New models: added 8 multi-modality models, image/video are supported.
- vLLM 0.10.2 with new features: P/D disaggregation(experimental), tooling, reasoning output, structured output,
- fp16/bf16 gemm optimizations for batch size 1-128. obvious improvement for small batch sizes.
- Bug fixes

This work will be especially important for next year's Crescent Island hardware release.

👁 Intel AI Software


More details on the new beta release via GitHub while the llm-scaler-vllm Docker container is available via the Docker Hub container image library.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.