smolvlm
Here are 24 public repositories matching this topic...
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
- Updated
- TypeScript
This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment
- Updated
- Python
A small VLM that sees everything
- Updated
- HTML
⭐ Comparing VLMs with CNNs for garbage classification
- Updated
- Jupyter Notebook
Scripts for combining SmolVLM and LLM
- Updated
- Python
Real-time vision demo using SmolVLM with llama.cpp backend
- Updated
- HTML
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
- Updated
- JavaScript
A benchmark suite for lightweight generative multimodal Vision-Language Models, comparing ViLT and SmolVLM under resource-constrained inference environments. Demonstrates CPU-only deployment, model evaluation, and multimodal reasoning with images and text, highlighting practical GenAI engineering for real-world applications.
- Updated
- Python
Demo project for Smol2Operator: Turn a vision-language model into a GUI agent that can see your screen and control it. Two-phase training teaches AI to locate UI elements and execute actions.
- Updated
A Flask-based web app for managing multimodal datasets text and images with CRUD operations via SQLite, and seamless export as a structured Parquet dataset to Hugging Face Hub.
- Updated
- HTML
Receipt OCR using Fine-tuned VLMs
- Updated
- Jupyter Notebook
Improve this page
Add a description, image, and links to the smolvlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the smolvlm topic, visit your repo's landing page and select "manage topics."
