AIOne-Agent-52B-A36B-it

A 52B / A36B sparse Mixture-of-Experts multimodal model for Korean reasoning, image understanding, and video understanding.

Model Description

AIOne-Agent-52B-A36B-it is a Korean-tuned multimodal Mixture-of-Experts (MoE) model based on Gemma 4 31B IT. The model retains the full text + image + video capabilities of the base Gemma 4 family and adds a Korean-domain MoE branch that activates the right experts for the input on the fly.

Multimodal. Accepts text, images, and video; produces fluent Korean (and English) responses.
Sparse MoE (top_k=2 of 8 experts) with always-on dense shared MLP. ~36 B parameters are active per token in the text backbone, while the full text backbone holds ~52 B parameters worth of capacity.
Long context. 256K tokens, inherited from the base model.

The name follows the Gemma 4 convention (google/gemma-4-26B-A4B-it): the first number is the text backbone parameter count, A{X}B is the per-token active parameter count, and the vision encoder (0.57 B) is reported separately.

Key Capabilities

Korean reasoning and instruction following.
Image understanding (caption, VQA, document understanding).
Video understanding (frame-by-frame reasoning).
Long-context document QA in Korean.
Bilingual: Korean (primary) + English.

Quick Start

Transformers

import torch
from transformers import AutoProcessor, Gemma4ForConditionalGeneration

MODEL_ID = "JDONE-Research/AIOne-Agent-52B-A36B-it"

model = Gemma4ForConditionalGeneration.from_pretrained(
 MODEL_ID,
 torch_dtype=torch.bfloat16,
 device_map="auto",
)
processor = AutoProcessor.from_pretrained(MODEL_ID)

messages = [
 {
 "role": "user",
 "content": [
 {"type": "image", "image": "file:///path/to/image.jpg"},
 {"type": "text", "text": "이 사진에 무엇이 보이나요? 한국어로 답해주세요."},
 ],
 },
]

inputs = processor.apply_chat_template(
 messages,
 add_generation_prompt=True,
 tokenize=True,
 return_tensors="pt",
 return_dict=True,
).to(model.device)

generated = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(
 processor.tokenizer.decode(
 generated[0, inputs.input_ids.shape[1]:], skip_special_tokens=True
 )
)

Text-only

messages = [
 {
 "role": "user",
 "content": [
 {
 "type": "text",
 "text": "사과 3개와 배 5개의 가격이 12,000원입니다. "
 "사과 1개가 1,500원이라면 배 1개 가격은? 단계적으로 풀이해주세요.",
 },
 ],
 },
]

vLLM (recommended for serving)

vllm serve JDONE-Research/AIOne-Agent-52B-A36B-it \
 --dtype bfloat16 \
 --tensor-parallel-size 4 \
 --max-model-len 32768

Sample Output

Korean math reasoning (text-only)

단계별 풀이 과정은 다음과 같습니다.

1단계: 사과 3개의 전체 가격 구하기 사과 1개의 가격이 1,500원이므로, 3개의 가격을 계산합니다.

1,500원 × 3개 = 4,500원

2단계: 배 5개의 전체 가격 구하기 전체 금액(12,000원)에서 사과 3개의 가격(4,500원)을 빼면 배 5개의 전체 가격이 나옵니다.

Multimodal (image + Korean caption)

다양한 색상의 점들이 섞여 무지개 빛깔의 그라데이션을 이루고 있는 이미지입니다.

Model Specs

Field	Value
Architecture	`Gemma4ForConditionalGeneration`
Base model	`google/gemma-4-31B-it`
Text backbone parameters	51.51 B → 52 B (in name)
Active parameters per token (text)	35.90 B → A36B (in name) (dense MLP always on + top-2 of 8 experts + attention)
Vision tower	0.57 B (SigLIP-style, 27 layers)
MM projector	0.01 B
Total weights on disk	52.09 B / ~104 GB (BF16)
MoE config	`num_experts=8`, `top_k=2`, `moe_intermediate_size=2688`
Modality	Text + Image + Video → Text
Precision	`bfloat16`
Context length	256K
Languages	Korean (primary), English

Intended Use

Korean enterprise agent backend (long-context tool use, RAG, multi-turn reasoning).
Image and video understanding with Korean output.
Document QA in Korean.

Out-of-Scope Use

Sole-source decision-making with legal consequences.
Automated use of force or coercive control based purely on this model's output.
Any media analysis that infringes on personal privacy, image rights, or applicable data-protection laws.

License

This model is released under the Apache License 2.0 license.

Commercial use, redistribution, and modification are permitted with attribution.
Provided "as is" without warranties or conditions of any kind.

Citation

@misc{aione_agent_52b_a36b_it,
 title = {AIOne-Agent-52B-A36B-it: A Korean Sparse-MoE Multimodal Model},
 author = {JDONE Research},
 year = {2026},
 howpublished = {\url{https://huggingface.co/JDONE-Research/AIOne-Agent-52B-A36B-it}}
}

Downloads last month: 490

Safetensors

Model size

52B params

Tensor type

BF16

Model tree for JDONE-Research/AIOne-Agent-52B-A36B-it

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it