Gemma4-BLIP3o-Captioner-5B

Gemma4-BLIP3o-Captioner-5B is a fine-tuned image captioning model built on top of Gemma-4-E2B-it, designed to mimic and replicate the BLIP3o (Bootstrapped Language-Image Pretraining) Captioning System through targeted fine-tuning on ~3K samples from the BLIP3o-Pretrain-Long-Caption dataset. The model features a modified chat template with a hardcoded expert system prompt engineered for dense, detail-rich image captioning — covering a wide range of image categories including scenery, natural environments, portraits, objects, and more — with thinking mode disabled by default to prioritize low-latency captioning outputs. It supports sequential video frame captioning, making it suitable for temporally ordered visual description tasks. As a captioning-specialized variant, the model may produce artifacts or degraded outputs when used for general-purpose conversational or instruction-following tasks outside its captioning scope. Ideal for applications requiring structured, verbose, and contextually accurate image descriptions in privacy-focused or local inference environments, leveraging the efficient multimodal backbone of Gemma-4-E2B-it.

Note: This model is experimental and not an all-purpose one.

Model Files

File Name	Quant Type	File Size	File Link
Gemma4-BLIP3o-Captioner-5B.BF16.gguf	BF16	9.31 GB	Download
Gemma4-BLIP3o-Captioner-5B.F16.gguf	F16	9.31 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q2_K.gguf	Q2_K	2.99 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q3_K_L.gguf	Q3_K_L	3.28 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q3_K_M.gguf	Q3_K_M	3.2 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q3_K_S.gguf	Q3_K_S	3.11 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q4_0.gguf	Q4_0	3.36 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q4_K_M.gguf	Q4_K_M	3.43 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q4_K_S.gguf	Q4_K_S	3.37 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q5_0.gguf	Q5_0	3.6 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q5_K_M.gguf	Q5_K_M	3.63 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q5_K_S.gguf	Q5_K_S	3.6 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q6_K.gguf	Q6_K	3.85 GB	Download
Gemma4-BLIP3o-Captioner-5B.Q8_0.gguf	Q8_0	4.95 GB	Download
Gemma4-BLIP3o-Captioner-5B.mmproj-bf16.gguf	mmproj-bf16	987 MB	Download
Gemma4-BLIP3o-Captioner-5B.mmproj-f16.gguf	mmproj-f16	987 MB	Download
Gemma4-BLIP3o-Captioner-5B.mmproj-q8_0.gguf	mmproj-q8_0	557 MB	Download

llama.cpp

LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp

Downloads last month: 1,390

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/Gemma4-BLIP3o-Captioner-5B

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Quantized

(237)

this model

Collection including prithivMLmods/Gemma4-BLIP3o-Captioner-5B

Collection of Multimodal Models for Captioning -> Experimental • 4 items • Updated 1 day ago • 1

URL: https://huggingface.co/prithivMLmods/Gemma4-BLIP3o-Captioner-5B

⇱ prithivMLmods/Gemma4-BLIP3o-Captioner-5B · Hugging Face

Gemma4-BLIP3o-Captioner-5B

Model Files

llama.cpp

Model tree for prithivMLmods/Gemma4-BLIP3o-Captioner-5B

Collection including prithivMLmods/Gemma4-BLIP3o-Captioner-5B