✂️ Interlace-Qwen3-VL-4B-20pc

👁 Paper
👁 Project Page
👁 GitHub
👁 Collection
👁 CVPR 2026

This model was produced by INTERLACE, a layer-pruning framework for Vision-Language Models. 20% of the transformer layers in Qwen/Qwen3-VL-4B-Instruct were removed using triplet-based similarity analysis, and the remaining model was fine-tuned on 1% of FineVision for a single epoch.

88.0% average relative performance retained | 20% layers dropped (7 of 36) | 29 layers remaining

📋 Model Details

Property	Value
Base Model	Qwen/Qwen3-VL-4B-Instruct
Pruning Method	INTERLACE (triplet-based interleaved pruning)
Pruning Ratio	20% (7 of 36 layers removed)
Remaining Layers	29
Hidden Size	2560
Fine-tuning Data	1% of FineVision (~240K samples)
Fine-tuning Epochs	1

🚀 Usage

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
 "pmadinei/Interlace-Qwen3-VL-4B-20pc",
 dtype="auto",
 device_map="auto",
 attn_implementation="flash_attention_2",
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-4B-Instruct")

messages = [
 {
 "role": "user",
 "content": [
 {"type": "image", "image": "path/to/image.jpg"},
 {"type": "text", "text": "Describe this image in detail."},
 ],
 }
]

inputs = processor.apply_chat_template(
 messages, tokenize=True, add_generation_prompt=True,
 return_dict=True, return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))

📊 Performance

Relative performance compared to the unpruned baseline (% of baseline score, Chain-of-Thought enabled):

Category	Benchmark	Relative Perf.
Text/Chart	AI2D	85.0%
Text/Chart	ChartQA	89.3%
Text/Chart	OCRBench	85.6%
Text/Chart	TextVQA	92.3%
General VQA	MMBench	83.5%
General VQA	POPE	98.7%
General VQA	RealWorldQA	90.2%
Perception	HRBench4K	88.8%
Perception	HRBench8K	87.3%
Perception	V-Star	81.7%
Inst & Sci	MIABench	87.0%
Inst & Sci	ScienceQA	86.2%
Overall Average	88.0%

🤗 All INTERLACE Models

Model	Drop %	Rel. Perf.
Interlace-Qwen3-VL-8B-10pc	10%	94.0%
Interlace-Qwen3-VL-8B-15pc	15%	92.1%
Interlace-Qwen3-VL-8B-20pc	20%	86.9%
Interlace-Qwen3-VL-8B-25pc	25%	86.1%
Interlace-Qwen3-VL-4B-10pc	10%	93.9%
Interlace-Qwen3-VL-4B-15pc	15%	91.9%
Interlace-Qwen3-VL-4B-20pc	20%	88.0%
Interlace-Qwen3-VL-4B-25pc	25%	81.7%

📝 Citation

@inproceedings{madinei2026interlace,
 title={Interlace: Interleaved layer pruning and efficient adaptation in large vision-language models},
 author={Madinei, Parsa and Solgi, Ryan and Wen, Ziqi and Skaza, Jonathan and Eckstein, Miguel and Pedarsani, Ramtin},
 booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
 pages={2947--2956},
 year={2026}
}

Downloads last month: 18

Safetensors

Model size

532k params

Tensor type

BF16

Model tree for pmadinei/Interlace-Qwen3-VL-4B-20pc

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(307)

this model

Dataset used to train pmadinei/Interlace-Qwen3-VL-4B-20pc

Collection including pmadinei/Interlace-Qwen3-VL-4B-20pc

INTERLACE: Interleaved Layer Pruning in VLMs (CVPR 2025). Pruned Qwen3-VL models retaining up to 94% performance. • 8 items • Updated Apr 14

Evaluation results

Relative Performance (avg)
self-reported
88.000

URL: https://huggingface.co/pmadinei/Interlace-Qwen3-VL-4B-20pc

⇱ pmadinei/Interlace-Qwen3-VL-4B-20pc · Hugging Face