Qwen3-VL-8B-Instruct-Unredacted-MAX

Qwen3-VL-8B-Instruct-Unredacted-MAX is an optimized release built on top of huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated. This version focuses on packaging improvements, inference stability, and modern Transformers compatibility, while preserving the strong multimodal reasoning capabilities of the base architecture. The result is a powerful 8B vision-language model designed for efficient research, structured captioning, and multimodal experimentation at scale.

Key Highlights

Optimized Release Pipeline Improved repository structure and loading consistency for smoother deployment and inference.
Modern Transformers Integration Updated compatibility for recent Hugging Face Transformers versions and vision-language utilities.
8B Vision-Language Architecture Built on Qwen3-VL-8B-Instruct, offering strong reasoning ability across image-text tasks with balanced compute requirements.
Stable Multimodal Inference Improved consistency for caption generation, visual reasoning, and structured outputs.
High-Quality Caption Generation Produces detailed, structured descriptions suitable for dataset creation, annotation workflows, and accessibility applications.
Dynamic Resolution Handling Maintains native support for variable image resolutions and aspect ratios.

Base Model Signatures

This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated

Quick Start with Transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model = Qwen3VLForConditionalGeneration.from_pretrained(
 "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX",
 torch_dtype="auto",
 device_map="auto"
)

processor = AutoProcessor.from_pretrained(
 "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX"
)

messages = [
 {
 "role": "user",
 "content": [
 {
 "type": "image",
 "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
 },
 {"type": "text", "text": "Provide a detailed caption for this image."},
 ],
 }
]

text = processor.apply_chat_template(
 messages,
 tokenize=False,
 add_generation_prompt=True
)

image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
 text=[text],
 images=image_inputs,
 videos=video_inputs,
 padding=True,
 return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)

output_text = processor.batch_decode(
 [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
 skip_special_tokens=True,
 clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Multimodal research and vision-language evaluation
Image captioning and dataset generation pipelines
Red-teaming and robustness testing of VLMs
Creative and descriptive visual storytelling tasks
AI system prototyping with image-text reasoning components

Limitations & Risks

Important Note: This model inherits behavioral characteristics from its base architecture and fine-tuning process.

Performance depends on image quality, prompt clarity, and decoding settings
May produce incomplete or inconsistent reasoning in complex visual scenes
Requires moderate to high VRAM for stable inference depending on resolution
Output quality varies across domains such as medical, artistic, or technical imagery

Downloads last month: 909

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(322)

this model

Finetunes

2 models

Quantizations

6 models

Space using prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX 1

Collection including prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX

multi-stage trained continual "abliteration" approach for the qwen3-vl series models • 5 items • Updated 2 days ago • 5

URL: https://huggingface.co/prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX

⇱ prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX · Hugging Face