VOOZH about

URL: https://huggingface.co/KRAFTON/Raon-VisionEncoder

โ‡ฑ KRAFTON/Raon-VisionEncoder ยท Hugging Face


๐Ÿ‘ Homepage

๐Ÿ‘ Hugging Face
๐Ÿ‘ X

๐Ÿ‘ License

Raon-VisionEncoder is a 1.14B-parameter vision-language foundation model by KRAFTON for image and text feature extraction. It supports zero-shot image classification, image-text retrieval, and native aspect ratio inference via NaFlex. Built on OpenCLIP with a LocCa (Localized CoCa) architecture and ViT-SO400M vision encoder.

Pretrained Models

Model Params (Inference) Vision Text Patch Size NaFlex Default Patches
LocCa ViT-SO400M-16-SigLIP2 1.14B 0.43B 0.71B 16x16 256

Requirements

pip install torch torchvision timm transformers huggingface-hub safetensors ftfy

Quick Start

import torch
from transformers import AutoModel
from PIL import Image

# Load model + processor
model = AutoModel.from_pretrained("KRAFTON/Raon-VisionEncoder", trust_remote_code=True)
model = model.to(dtype=torch.bfloat16).eval()
processor = model.get_processor("KRAFTON/Raon-VisionEncoder")

# Encode image and text
img_inputs = processor(images=Image.open("assets/photo.jpg"))
txt_inputs = processor(text=["a cat", "a dog"])

with torch.no_grad():
 img_feat = model.encode_image(**img_inputs)
 txt_feat = model.encode_text(**txt_inputs)

 # Compute similarity with learned scale and bias
 logits = model.logit_scale.exp() * (img_feat @ txt_feat.T) + model.logit_bias
 probs = logits.softmax(dim=-1)
 print(probs)

API Reference

Method Input Output
model.encode_image(**inputs) Processor output (image) [B, 1152] normalized image features
model.encode_text(**inputs) Processor output (text) [B, 1152] normalized text features
model.logit_scale - Learned temperature parameter
model.logit_bias - Learned bias parameter
model.get_processor(repo_id) HuggingFace repo ID Processor instance
processor(images=img) PIL Image Preprocessed image dict
processor(text=["a cat"]) list of strings Tokenized text dict

License

This repository is licensed under the Apache License 2.0. Third-party notices in NOTICE.

ยฉ 2026 KRAFTON

Downloads last month
24
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including KRAFTON/Raon-VisionEncoder