Qwen3-VL-8B-Thinking-Unredacted-MAX

Qwen3-VL-8B-Thinking-Unredacted-MAX is an optimized release built on top of huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated. This version focuses on stable inference behavior, improved packaging consistency, and updated Transformers compatibility, while preserving the strong multimodal reasoning and “thinking” capabilities of the base architecture. The result is a capable 8B vision-language model designed for structured reasoning, captioning, and research-oriented multimodal workflows.

Key Highlights

Optimized Release Structure Improved repository organization for smoother deployment and reproducible loading.
Modern Transformers Compatibility Updated to work reliably with recent Hugging Face Transformers and multimodal processing pipelines.
8B Thinking Vision-Language Architecture Built on Qwen3-VL-8B-Thinking, enabling stronger step-by-step visual reasoning compared to standard instruct variants.
Stable Multimodal Reasoning Improved consistency for image interpretation, captioning, and structured output generation.
High-Fidelity Caption Generation Produces detailed, structured descriptions suitable for dataset creation, annotation, and accessibility use cases.
Dynamic Resolution Support Retains native support for varying image resolutions and aspect ratios.

Base Model Signatures

This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated

Quick Start with Transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model = Qwen3VLForConditionalGeneration.from_pretrained(
 "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX",
 torch_dtype="auto",
 device_map="auto"
)

processor = AutoProcessor.from_pretrained(
 "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX"
)

messages = [
 {
 "role": "user",
 "content": [
 {
 "type": "image",
 "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
 },
 {"type": "text", "text": "Provide a detailed caption for this image."},
 ],
 }
]

text = processor.apply_chat_template(
 messages,
 tokenize=False,
 add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
 text=[text],
 images=image_inputs,
 videos=video_inputs,
 padding=True,
 return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

output_text = processor.batch_decode(
 [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
 skip_special_tokens=True,
 clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Multimodal reasoning research and evaluation
Image captioning and dataset annotation pipelines
Vision-language model benchmarking and robustness testing
Creative visual storytelling and structured description generation
Prototyping AI systems that combine reasoning with image understanding

Limitations & Risks

Important Note: This model inherits behaviors from its base architecture and multimodal training setup.

Performance depends heavily on image quality and prompt clarity
May produce incomplete or inconsistent reasoning in complex scenes
Requires sufficient GPU memory for stable inference
Output quality varies across domains such as scientific, artistic, or real-world imagery

Downloads last month: 24

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX

Base model

Qwen/Qwen3-VL-8B-Thinking

Finetuned

(64)

this model

Quantizations

4 models

Collection including prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX

multi-stage trained continual "abliteration" approach for the qwen3-vl series models • 5 items • Updated 2 days ago • 5

URL: https://huggingface.co/prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX

⇱ prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX · Hugging Face