📑 Model Card: LFM2.5-Queen-Opus-4.7-8B-A1B (Opus-4.7-Distilled Experimental)
GGUF
- https://huggingface.co/mradermacher/LFM2.5-Queen-Opus-4.7-8B-A1B-i1-GGUF
- https://huggingface.co/mradermacher/LFM2.5-Queen-Opus-4.7-8B-A1B-GGUF
1. Model Overview
- Base Model: Liquid AI LFM2.5-8B-A1B (Sparse Mixture of Experts MoE, ~8.3B total parameters, ~1.5B active parameters per token)
- Finetuning Method: LoRA (Rank 16, Alpha 32)
- Training Dataset:
lordx64-claude-opus-4.7-max-cleaned(An SFT reasoning and roleplay dataset distilled from Claude Opus 4.7, released in 2026) - Experimental Purpose: This is an exploratory study model designed to analyze the cognitive modeling limits of a lightweight (~1.5B active parameters) MoE architecture when trained on advanced reasoning and adaptive-thinking data distilled from Claude Opus 4.7. The key focus is to observe behavior and limitations under Native (English) and Cross-Language (Non-English/Chinese) task pipelines.
The spirit is willing, but the flesh is weak.
2. Experimental Findings: Native English vs. Non-English (Chinese) Contrast
The evaluation revealed a stark performance divergence between native English and Chinese cross-language tasks:
| Evaluation Dimension | 🇬🇧 Native English Performance | 🇨🇳 Non-English (Chinese) Performance |
|---|---|---|
| Linguistic Fidelity | Excellent. Highly idiomatic, elegant, and atmospheric. Uses rich, contextual vocabulary (e.g., jeweled garter, icy, measured tone). | Suboptimal. Prone to severe "translation bleeding," where standard English roleplay tropes are translated literally and awkwardly (e.g., jaw clench -> 颔面抽掏; petticoat -> 衬裙的背心). |
| Role & Persona Control | Very Stable. Pronoun alignment and monarch/subject dynamics remain completely consistent. | Unstable. Suffers from role/gender reversal and pronoun confusion (e.g., a Queen referring to herself using subordinate pronouns like 臣). |
| Code Syntax Tree | Highly Precise. Strictly preserves the specific programming language constraints (e.g., Python/Go). | Syntactically Mixed. When generating complex code (e.g., Java), it frequently bleeds structures from other languages (e.g., Go’s := or Kotlin’s as). |
| EOS Termination Control | Accurate. Successfully predicts the End-Of-Sequence (EOS) token and terminates cleanly upon completing a prompt. | Flawed. Prone to falling into "circular echoing" repetition loops, where it copies its first output sentence at the very end. |
| Observed Artifacts | Minor "perspective bleeding" (e.g., shifting from 3rd-person She to 1st-person I inside the same bracketed action sequence). | Severe cumulative hallucinations involving grammar mixing, pronoun chaos, literal translation glitches, and looping. |
3. Domain-Specific Technical Evaluation
3.1 Cognitive & Thinking Space (CoT)
- Strengths: In both English and Chinese, the model shows a notable ability to structure its
<think>space, reflecting Opus 4.7's adaptive reasoning. It actively attempts multi-step planning, drafts state transition tables, and identifies deep race conditions (e.g., lock orders in dual data structures). - Bottlenecks: While the reasoning intent is highly developed, the physical limits of a 1.5B active parameter MoE mean that the model occasionally suffers from logical "step-overs" or hallucinated pseudo-steps, especially in Chinese where translation overhead consumes cognitive capacity.
3.2 Hard Sciences & Coding (Strict Logic)
- Native English Coding: Code structures are syntactically sound and logical flow is well-preserved.
- Chinese Coding: Highly prone to failure. The cognitive load of concurrent cross-language concept mapping and syntax tree construction degrades output quality. Multi-language bleeding (syntactic conflation) is common. Users must simplify prompts or switch entirely to English to ensure compiler compatibility.
3.3 Humanities & Roleplay (Persona & Creativity)
- Native English RP: The model's strongest use case. It accurately mimics Opus 4.7's cold, calculating, and majestic narrative style. Bracketed stage directions and physical cues are highly evocative, with sharp and politically charged dialogue.
- Chinese RP: Vulnerable to multi-task instruction overload. However, when guided by highly simplified prompts and strong pronoun constraints, it can still deliver acceptable, dramatic output, despite minor translation artifacts.
4. Key Technical Bottlenecks Identified
- Translation Overhead in Small Models: Under cross-language tasks, the ~1.5B active parameter pool must allocate a substantial portion of its attention weights to "concept translation and alignment." This depletes the parameters required to maintain strict syntax boundaries and character constraints, resulting in low-level logic bugs.
- MoE Gating / Routing Noise: During creative generation, the MoE gating mechanism (Router) occasionally routes tokens to sub-optimal experts (e.g., literal translation engines or mixed-language coding memories), introducing stylistic or syntactic noise.
- Quantization Degrade (Q8_0): While Q8_0 has negligible impact on high-probability English tokens, it noticeably degrades performance on low-probability structured tokens, such as exact code punctuation and non-native language constraints.
5. Inference & Optimization Guidelines
5.1 Native English Configuration (Recommended)
- Task Profile: Best suited for English Roleplay, Creative Writing, and single-language Coding.
- Sampler Settings:
- Logical / Coding tasks:
Temperature: 0.1 ~ 0.2(for strict syntactic output) - Roleplay / Creative tasks:
Temperature: 0.5 ~ 0.6(provides the best balance of creative prose and semantic stability)
- Logical / Coding tasks:
- Repetition Penalty:
1.1(no major looping issues observed in English)
5.2 Non-English / Chinese Configuration
- 1. Keep Prompts Simple: Limit instructions to a single-layer task. Do not mix complex formatting, cross-language translation, and nested logic in a single prompt.
- 2. Enforce Anti-Translation Constraints:
Explicitly define vocabulary boundaries in the prompt.
- Example: `"Act as Queen Eleanor. Address yourself as '本宫' and never '臣'. Use formal, classical Chinese nouns for attire. Do not translate English RP idioms literally."*
- 3. Restrict Sampler Temperature:
- Logical / Coding tasks:
Temperature: 0.1(critical to suppress multi-language code bleeding) - Roleplay / Creative tasks:
Temperature: 0.3 ~ 0.4(lowering the temperature effectively mitigates literal translation hallucinations like 颔面抽掏)
- Logical / Coding tasks:
Drafted on: June 2, 2026
- Downloads last month
- 85
Safetensors
Model size
8B params
Tensor type
F32
·
BF16 ·
Model tree for aifeifei798/LFM2.5-Queen-Opus-4.7-8B-A1B
Base model
LiquidAI/LFM2.5-8B-A1B-Base Finetuned
LiquidAI/LFM2.5-8B-A1B Finetuned
aifeifei798/LFM2.5-Queen-8B-A1BQuantizations
2 models