Qwen-3.5-4B-AiO-LR-LoRA
- Developed by: leeminwaan
- Base Architecture:
unsloth/Qwen3.5-4B - Objective: Mitigation of autoregressive semantic divergence via Sparse Latent Regularization (S-LR).
- Optimization Strategy: Log-Sum-Exp (LSE) contextual anchoring with ReLU-hinge constraints.
Technical Methodology: Sparse Latent Regularization
The AiO framework implements a non-linear auxiliary penalty designed to enforce contextual invariance within a subset of the generated latent trajectory. This approach contrasts with global alignment strategies by permitting high local variance in reasoning-dense tokens while ensuring the existence of high-fidelity contextual milestones.
Objective Function Specification
The model optimizes a joint objective , where the regularization term is defined as a soft-maximum of the regional cosine distance:
Parameter Definitions:
- \( z_{ref} \): The contextual reference vector, derived via global mean-pooling of the penultimate layer hidden states of the input prompt.
- \( \Phi \): A 1D-temporal average pooling filter (Window Size: 16) applied to the generated sequence to stabilize semantic signals against syntactic fluctuations.
- \( \beta \): The sharpness parameter (\( \beta=50 \)), determining the degree of sparsity in the anchor selection.
- \( \tau \): The similarity margin threshold (\\( \tau=0.999 \\)).
Empirical Analysis of Run 4
Training dynamics during the final 225-step iteration (Batch Size 8) demonstrated a successful transition from contextual saturation to latent sparsity.
Decoupling of Maximum and Average Similarity
Analysis of the latent manifolds revealed a critical divergence between the Maximum Similarity (MAX) and Average Similarity (AVG) metrics:
- Asymptote Convergence: The MAX Similarity metric converged to an asymptote of , indicating the successful formation of high-density anchor states.
- Latent Bandwidth Recovery: Concurrently, the AVG Similarity metric experienced a planned regression from a peak of to a stable floor of .
- Interpretation: This decoupling indicates that the model satisfied the regularization objective through sparse, high-fidelity representations rather than global distribution shifts. This architecture preserves the model's capacity for high-variance reasoning steps while maintaining global contextual anchoring.
Optimization Efficiency
- Cross-Entropy Optimization: The primary loss achieved a localized optimum of ~2.8, a significant improvement over mean-pooling alignment strategies (>3.0).
- Gradient Masking: The ReLU-hinge constraint reduced the weighted auxiliary penalty (\(\mathcal{L}_{reg} \cdot \lambda\)) to a negligible 0.02, confirming that the model has internalized the anchoring requirement and transitioned to pure sequence modeling.
Performance Characteristics
- Semantic Milestone Generation: The model generates tokens with high-density contextual information that serve as reference points for subsequent attention layers.
- Mitigation of Sequential Drift: By maintaining a similarity spike, the model stabilizes the attention mechanism against the cumulative entropy characteristic of long-form autoregressive generation.
- Preservation of Local Entropy: The reduction in average similarity confirms that the model retains the ability to navigate complex logical branches without the "mode collapse" associated with excessive latent regularization.
Usage
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "leeminwaan/qwen3.5_4B_AiO_LR_lora_kimi_k2.5_distilation",
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": "Input formal reasoning prompt here."}],
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=2048)
- Downloads last month
- 633
3-bit
4-bit
5-bit
8-bit
