VOOZH

URL: https://github.com/topics/smolvlm

⇱ smolvlm · GitHub Topics · GitHub

#

smolvlm

Here are 24 public repositories matching this topic...

jamjamjon / usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.

ocr cuda sam yolo tensorrt imshow onnx onnxruntime sam3 yolov8 grounding-dino florence2 rust-yolo yolo-rust yolo-rs yolo11 yolov11 smolvlm

Updated
Rust

lucasjinreal / Namo-R1

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

llm mllm vllm moondream vllms smolvlm

Updated
Python

kiranbaby14 / TalkMateAI

🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync

websocket nextjs vlm fastapi huggingface whisper-ai flash-attention-2 multimodal-ai kokoro-tts smolvlm

Updated
TypeScript

yakhyo / smolvlm-realtime-webcam-vllm

Real-time webcam demo using SmolVLM with vLLM backend

real-time webcam-capture smolvlm

Updated
HTML

stlin256 / VLM4Classification

Finetune VLM for Image classification in specific fields

python image image-classification vlm finetune smolvlm

Updated
Python

mvish7 / AlignVLM

This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment

multimodality huggingface-transformers vision-language-pretraining vision-language-model smolvlm vision-language-alignment

Updated
Python

iBz-04 / reeltek

A small VLM that sees everything

python ocr gpu-acceleration scene-understanding real-time-detection local-models tts-js huggingface vision-models vision-language-model llamacpp llm-inference vlms smolvlm

Updated
HTML

Qengineering / SmolVLM2-256M-NPU

SmolVLM2 on the RK3588 NPU

ai image-to-text vlm visi npu rknn rk3588 rock-5c rkllm smolvlm smolvlm2

Updated
C++

snnclsr / chatgpt-from-scratch

A full-stack ChatGPT-like application built (almost) from scratch

multi-modal llms chatgpt qwen2-5 smolvlm gemma3

Updated
Python

Qengineering / SmolVLM2-2B-NPU

SmolVLM2 on the RK3588 NPU

ai vision image-to-text vlm npu rknn rk3588 rock-5c rkllm smolvlm smolvlm2

Updated
C++

gabrielSantosLima / vlm_garbage_classification

⭐ Comparing VLMs with CNNs for garbage classification

ai tensorflow keras cnn artificial-intelligence vlm efficientnetv2 llm smolvlm

Updated
Jupyter Notebook

stlin256 / SmolVLM_with_LLM

Scripts for combining SmolVLM and LLM

image-recognition video-classification vlm llm smolvlm qwen3

Updated
Python

Qengineering / SmolVLM2-500M-NPU

SmolVLM2 on the RK3588 NPU

ai vision image-to-text vlm npu rknn rk3588 rock-5c rkllm smolvlm smolvlm2

Updated
C++

souradipp76 / PixRec

Tool for multi-modal recommendation

recommender-system llm smolvlm paligemma2

Updated
Python

mrgehlot / object_detection_using_vllm

Real-time vision demo using SmolVLM with llama.cpp backend

open-source real-time computer-vision artificial-intelligence image-to-text llamacpp vision-language-models multimodal-ai smolvlm

Updated
HTML

CasualEngineerZombie / smolvlm-realtime-face

A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.

face-recognition webcam llm-inference llama-server smolvlm

Updated
JavaScript

Kumaran-Elumalai / nextgen-multimodal-generative-vlm-evaluation-suite

A benchmark suite for lightweight generative multimodal Vision-Language Models, comparing ViLT and SmolVLM under resource-constrained inference environments. Demonstrates CPU-only deployment, model evaluation, and multimodal reasoning with images and text, highlighting practical GenAI engineering for real-world applications.

python ai ml vqa gradio multimodal-deep-learning huggingface-transformers vilt generativeai visionlanguagemodel smolvlm

Updated
Python

LiteObject / smol-gui-agent

Demo project for Smol2Operator: Turn a vision-language model into a GUI agent that can see your screen and control it. Two-phase training teaches AI to locate UI elements and execute actions.

computer-vision transformers pytorch gui-automation fine-tuning multimodal huggingface ai-agent vision-language-model smolvlm

Updated

kucingcoder / miramo

A Flask-based web app for managing multimodal datasets text and images with CRUD operations via SQLite, and seamless export as a structured Parquet dataset to Hugging Face Hub.

llama datasets bert vlm multimodal huggingface llm llm-training smolvlm

Updated
HTML

omkarsoak / VLM-Receipt-OCR

Receipt OCR using Fine-tuned VLMs

fine-tuning vision-language-models smolvlm

Updated
Jupyter Notebook

Improve this page

Add a description, image, and links to the smolvlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the smolvlm topic, visit your repo's landing page and select "manage topics."

You can’t perform that action at this time.