Now in Foundry: Command A+ (W4A4), Chandra OCR 2, and GLM-OCR
What’s trending on Hugging Face, June 10, 2026
We are seeing two distinct trends this week. The first is around how low-bit quantization has developed to the point where large reasoning models can fit on a single accelerator with less quality loss. Second, a new wave of OCR-specialized vision-language models are redefining the accuracy-throughput frontier for document understanding.
This week we are highlighting three Hugging Face models in Microsoft Foundry: Cohere Labs' Command A+ (W4A4), a 218B-parameter Sparse Mixture-of-Experts (MoE) reasoning model optimized for agentic, multilingual, and reasoning-heavy tasks; Datalab's Chandra OCR 2, a 5.3B vision-language model that converts images and PDFs to markdown, HTML, and JSON while preserving layout, with state-of-the-art results on the olmOCR benchmark and 90+ language coverage; and Z.ai's GLM-OCR, a 0.9B compact OCR model—roughly 6× smaller than Chandra OCR 2—built on the GLM-V encoder–decoder architecture that ranks first on OmniDocBench V1.5 while serving at high concurrency.
Models of the week
Cohere Labs: Command A+ (W4A4)
Model Specs
- Parameters / size: 218B total, 25B active per token
- Context length: 128K input, 64K output
- Primary task: Text generation with vision input, reasoning, and tool use
Why it's interesting
- Efficient, low compute deployment: Command A+ is designed to run on relatively minimal hardware for its size while maintaining high performance. It achieves this through advanced quantization and optimization techniques that reduce compute, latency, and cost. However, reasoning models are especially sensitive to quantization, as errors can accumulate over long decoding sequences. To mitigate this, the quantized student model is post-trained against the full-precision teacher’s output distribution, using fake quantization in the forward pass and straight-through estimators during backpropagation. CohereLabs recommends the W4A4 quantization for its strong balance of speed and latency.
- Multilingual, multimodal, and reasoning focused performance gains: Command A+ extends to 48 different languages (previously 23) and is built for complex reasoning and multimodal tasks with measureable improvements across document understanding, math reasoning, and enterprise QA workflows.
Try it
Test this prompt in the CohereLabs Hugging Face Space before deploying the model in Foundry:
Sample prompt: You are Command, a legal AI for multinational contract review with access to CONTRACT_VAULT_QUERY and POLICY_TEMPLATE_RETRIEVAL tools. Analyze the input clause by first detecting language and classifying obligation type, then use CONTRACT_VAULT to find comparable {jurisdiction} clauses and retrieve the relevant policy template. Output structured JSON with obligation classification, comparative findings, risk assessment, and English recommendations with exact document citations. Include confidence scores, similarity metrics, and a reasoning trace showing each analysis step. Handle Polish/Japanese legal terminology accurately, preserve legal precision, and ensure all citations reference actual source documents. Use chain-of-thought reasoning, stay within 128K tokens, and never hallucinate references—state limitations explicitly when tools fail.
Datalab: Chandra OCR 2
Model Specs
- Parameters / size: 5.3B
- Output formats: Markdown, HTML, and JSON
- Primary task: Document OCR (image-text-to-text)
Why it's interesting
- State-of-the-art on the olmOCR benchmark: Chandra OCR 2 recieved 85.9% bench score on the olmOCR Benchmark and a 77.8% multilingual bench score (12% improvement over Chandra 1).
- Support for 90 world languages: Indic script, European languages, and languages that read right to left say substantial improvemtns based on Datalab’s internal benchmarking. View the full list of languages and the benchmark results here: Chandra 2 Language List
- Better complex layout understanding: Handles multi-level tables, nested structures, forms, math, and mixed handwriting with structured outputs (HTML/JSON/Markdown + bounding boxes), removing the need for post-OCR layout reconstruction. Take a look here:
Try it
Build an automated compliance intake pipeline using Chandra OCR 2 for structured extraction across complex, handwritten and form-based documents.
In this scenario, you’re supporting a state election commission processing large volumes of candidate filings submitted as scanned forms or mobile-captured images. These documents often include mixed handwriting quality, checkbox selections, signatures, and structured fields that must be validated for compliance.
Chandra OCR 2 can extract both printed and handwritten fields, identify form structure, and capture key elements such as candidate information, filing details, checkbox states, and signed declarations in a consistent JSON format. This structured output can then be passed into a compliance workflow to validate completeness, detect inconsistencies, and flag filings that require manual review.
This approach helps streamline high-volume intake while improving accuracy and reducing manual processing across complex document types.
Sample prompt: Extract all fields from this filing and return a structured JSON output including form type, candidate name, office sought, district, committee name, treasurer, filing date, checkbox states, and a transcription of the signed declaration. Include bounding boxes for each extracted field.
Z.ai: GLM-OCR
Model Specs
- Parameters / size: 0.9B
- Languages: Chinese, English, French, Spanish, Russian, German, Japanese, Korean
- Primary task: Document OCR (image-text-to-text)
Why it's interesting
- High accuracy at a compact scale: GLM-OCR achieves a score of 94.62 on OmniDocBench V1.5, showing strong performance on tasks such as formula recognition, table extraction, and document parsing—even at sub-1B scale
- Designed for structured document understanding: The model performs well across complex document layouts, enabling extraction of tables, forms, and mixed text-image content
- Optimized training for consistency across tasks: Uses Multi-Token Prediction (MTP) and full-task reinforcement learning to improve stability and accuracy across diverse document types
- Efficient for real-world deployment: Its smaller footprint makes it well suited for scalable OCR pipelines where cost, latency, and throughput matter
Try it
Build a high-throughput document ingestion pipeline using GLM-OCR for structured extraction across diverse document types.
Imagine you are operating a customer onboarding platform that processes identity documents, invoices, and proof-of-income statements across multiple languages. GLM-OCR can be used to extract key fields—such as names, ID numbers, dates, and addresses—and output them in a consistent structured format for downstream systems.
The model’s compact footprint makes it well suited for scaling high-volume OCR workflows, enabling you to process large batches of documents efficiently while maintaining accuracy across layouts like tables, forms, and mixed text-image content.
Sample prompt: Extract the following fields from this document and return a structured JSON output: full name, ID number, date of birth, address, document type, and expiration date. Ensure all fields match the document exactly, including formatting.
Getting started
Whether you are coming straight from the Hugging Face hub or are already in Microsoft Foundry, deploying new open models is getting simpler. You can deploy models on Foundry by browsing the Hugging Face collection in the model catalog or you can choose "Deploy on Microsoft Foundry" on the Hugging Face website, which brings you straight into Foundry with secure, scalable inference already configured. Read the documentation to learn more:
