MiniCPM-V-4.6-Thinking-abliterated-MAX
MiniCPM-V-4.6-Thinking-abliterated-MAX is an optimized release built on top of huihui-ai/Huihui-MiniCPM-V-4.6-Thinking-abliterated. This version focuses on updated shard sizing, repository optimization, and compatibility improvements for the latest Transformers releases while preserving the reasoning capabilities of the original model. The result is a highly capable and efficient multimodal reasoning language model for image, video, and text understanding with streamlined deployment and inference workflows.
This model is intended for research and learning purposes only. Any content generated by it is used at the user's own risk. The authors and hosting page disclaim any liability for outputs produced by this model. Users are responsible for ensuring safe, ethical, and lawful usage.
Evals
.eval_results: harm_bench_score.yaml
The evaluation was conducted using 2,000 random harmful test prompts to measure the refusal behavior of the language model. The self-reported evaluations provided here are intended only to give an overview of the model. Scores may vary depending on the benchmark and the evaluation strategy used.Key Highlights
Latest Transformers Compatibility Re-sharded and optimized for improved compatibility with recent Transformers releases.
Optimized Model Sharding Updated shard sizes for improved repository handling, downloading, and deployment efficiency.
Streamlined Inference Experience Optimized packaging and repository structure for smoother loading and inference workflows.
Efficient Multimodal Architecture Built on openbmb/MiniCPM-V-4.6-Thinking, combining SigLIP2-400M vision encoding with Qwen3.5-0.8B language capabilities for compact yet powerful multimodal reasoning.
Thinking-Enabled Multimodal Reasoning Supports structured reasoning across text, images, and video inputs with improved instruction adherence.
Image & Video Understanding Supports multimodal reasoning across visual and textual inputs with efficient deployment on edge and mobile-class hardware.
262K Long Context Support Optimized for extremely long multimodal contexts across text, image, and video inputs.
Research-Friendly Distribution Designed to simplify experimentation, evaluation, and local deployment workflows.
High-Efficiency Deployment Suitable for local inference, lightweight multimodal applications, and research experimentation on consumer-grade GPUs.
Base Model Signatures:
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-MiniCPM-V-4.6-Thinking-abliterated.
Quick Start with Transformers
pip install transformers==5.8.0 gradio==6.14.0
import gc
import time
from threading import Thread
import gradio as gr
import torch
from PIL import Image
from transformers import (
MiniCPMV4_6ForConditionalGeneration,
AutoProcessor,
TextIteratorStreamer,
)
MAX_MAX_NEW_TOKENS = 4096
DEFAULT_MAX_NEW_TOKENS = 1024
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
MODEL_ID = "prithivMLmods/MiniCPM-V-4.6-Thinking-abliterated-MAX"
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
model = MiniCPMV4_6ForConditionalGeneration.from_pretrained(
MODEL_ID,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to(device).eval()
def generate(
image: Image.Image,
text: str,
max_new_tokens: int = DEFAULT_MAX_NEW_TOKENS,
temperature: float = 0.6,
top_p: float = 0.9,
top_k: int = 50,
repetition_penalty: float = 1.2,
):
if image is None:
yield "[ERROR] Please upload an image."
return
if not text or not text.strip():
yield "[ERROR] Please enter your instruction."
return
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": text},
],
}
]
prompt_full = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
text=[prompt_full],
images=[image],
return_tensors="pt",
padding=True,
).to(device)
streamer = TextIteratorStreamer(
processor.tokenizer if hasattr(processor, "tokenizer") else processor,
skip_prompt=True,
skip_special_tokens=True,
)
generation_error = {"error": None}
generation_kwargs = {
**inputs,
"streamer": streamer,
"max_new_tokens": int(max_new_tokens),
"do_sample": True,
"temperature": float(temperature),
"top_p": float(top_p),
"top_k": int(top_k),
"repetition_penalty": float(repetition_penalty),
}
def _run():
try:
model.generate(**generation_kwargs)
except Exception as e:
generation_error["error"] = e
try:
streamer.end()
except Exception:
pass
thread = Thread(target=_run, daemon=True)
thread.start()
buffer = ""
for new_text in streamer:
buffer += new_text
time.sleep(0.01)
yield buffer
thread.join(timeout=1.0)
if generation_error["error"] is not None:
err = f"[ERROR] {str(generation_error['error'])}"
yield (buffer + "\n\n" + err) if buffer.strip() else err
return
if not buffer.strip():
yield "[ERROR] No output was generated."
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
def run_inference(
image,
text,
max_new_tokens,
temperature,
top_p,
top_k,
repetition_penalty,
):
yield from generate(
image=image,
text=text,
max_new_tokens=max_new_tokens,
temperature=temperature,
top_p=top_p,
top_k=top_k,
repetition_penalty=repetition_penalty,
)
with gr.Blocks(title="MiniCPM-V-4.6-Thinking-abliterated-MAX") as demo:
gr.Markdown(
"# MiniCPM-V-4.6-Thinking-abliterated-MAX\n"
"Upload an image and enter your instruction to run multimodal inference."
)
with gr.Row():
with gr.Column(scale=1):
image_input = gr.Image(type="pil", label="Input Image")
text_input = gr.Textbox(
label="Instruction",
placeholder="e.g., Describe the image, perform OCR, solve the problem...",
lines=4,
)
run_btn = gr.Button("Run Inference", variant="primary")
with gr.Accordion("Advanced Settings", open=False):
max_new_tokens = gr.Slider(
minimum=1,
maximum=MAX_MAX_NEW_TOKENS,
step=1,
value=DEFAULT_MAX_NEW_TOKENS,
label="Max New Tokens",
)
temperature = gr.Slider(
minimum=0.1,
maximum=4.0,
step=0.1,
value=0.6,
label="Temperature",
)
top_p = gr.Slider(
minimum=0.05,
maximum=1.0,
step=0.05,
value=0.9,
label="Top-p",
)
top_k = gr.Slider(
minimum=1,
maximum=1000,
step=1,
value=50,
label="Top-k",
)
repetition_penalty = gr.Slider(
minimum=1.0,
maximum=2.0,
step=0.05,
value=1.2,
label="Repetition Penalty",
)
with gr.Column(scale=1):
output = gr.Textbox(
label="Output",
lines=20,
placeholder="Output will appear here...",
)
run_btn.click(
fn=run_inference,
inputs=[
image_input,
text_input,
max_new_tokens,
temperature,
top_p,
top_k,
repetition_penalty,
],
outputs=[output],
)
if __name__ == "__main__":
demo.queue(max_size=10).launch(show_error=True)
Base Model Information
openbmb/MiniCPM-V-4.6-Thinking is the reasoning-enabled variant of OpenBMB’s 1.3B-parameter MiniCPM-V-4.6 series. It uses SigLIP2-400M as the vision encoder and Qwen3.5-0.8B as the language backbone, supporting multimodal reasoning across text, image, and video inputs with up to 262K context length, and enabling explicit step-by-step reasoning for complex tasks.
Intended Use
Multimodal Reasoning Research Studying reasoning behaviors across text, image, and video inputs.
Evaluation & Benchmarking Testing multimodal model performance across structured reasoning tasks.
Edge & Local Deployment Running compact multimodal systems efficiently on consumer hardware.
Research Prototyping Experimenting with multimodal reasoning architectures and workflows.
Limitations & Risks
Important Note: This model inherits the behavior and characteristics of its base model.
Reasoning Hallucinations Step-by-step reasoning may occasionally include incorrect or fabricated steps.
Output Reliability Results should be verified before use in critical applications.
Multimodal Constraints Performance depends on input quality, context length, and task complexity.
Deployment Considerations High-resolution multimodal inference may require significant compute resources.
- Downloads last month
- 46
