Gemma4-BLIP3o-Captioner-5B
Gemma4-BLIP3o-Captioner-5B is a fine-tuned image captioning model built on top of Gemma-4-E2B-it, designed to mimic and replicate the BLIP3o (Bootstrapped Language-Image Pretraining) Captioning System through targeted fine-tuning on ~3K samples from the BLIP3o-Pretrain-Long-Caption dataset. The model features a modified chat template with a hardcoded expert system prompt engineered for dense, detail-rich image captioning — covering a wide range of image categories including scenery, natural environments, portraits, objects, and more — with thinking mode disabled by default to prioritize low-latency captioning outputs. It supports sequential video frame captioning, making it suitable for temporally ordered visual description tasks. As a captioning-specialized variant, the model may produce artifacts or degraded outputs when used for general-purpose conversational or instruction-following tasks outside its captioning scope. Ideal for applications requiring structured, verbose, and contextually accurate image descriptions in privacy-focused or local inference environments, leveraging the efficient multimodal backbone of Gemma-4-E2B-it.
Note: This model is experimental and not an all-purpose one.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Gemma4-BLIP3o-Captioner-5B.BF16.gguf | BF16 | 9.31 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.F16.gguf | F16 | 9.31 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q2_K.gguf | Q2_K | 2.99 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q3_K_L.gguf | Q3_K_L | 3.28 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q3_K_M.gguf | Q3_K_M | 3.2 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q3_K_S.gguf | Q3_K_S | 3.11 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q4_0.gguf | Q4_0 | 3.36 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q4_K_M.gguf | Q4_K_M | 3.43 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q4_K_S.gguf | Q4_K_S | 3.37 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q5_0.gguf | Q5_0 | 3.6 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q5_K_M.gguf | Q5_K_M | 3.63 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q5_K_S.gguf | Q5_K_S | 3.6 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q6_K.gguf | Q6_K | 3.85 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.Q8_0.gguf | Q8_0 | 4.95 GB | Download |
| Gemma4-BLIP3o-Captioner-5B.mmproj-bf16.gguf | mmproj-bf16 | 987 MB | Download |
| Gemma4-BLIP3o-Captioner-5B.mmproj-f16.gguf | mmproj-f16 | 987 MB | Download |
| Gemma4-BLIP3o-Captioner-5B.mmproj-q8_0.gguf | mmproj-q8_0 | 557 MB | Download |
llama.cpp
LLM inference in C/C++ — https://github.com/ggml-org/llama.cpp
- Downloads last month
- 1,390
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
