openai-vision

A lightweight Vision-Language-Action (VLA) baseline for MetaWorld robot-arm tasks using a pretrained CLIP-ViT vision transformer(openai/clip-vit-base-patch32), a small text transformer, and robot-state fusion

transformers pytorch vlm metaworld clip-vit openai-vision clip-vit-b-32-vision

Updated
Python

RaheesAhmed / video_description_generator

Star

This Python script processes a video file, generates a compelling description, creates a voiceover script in the style of David Attenborough, and synthesizes the voiceover using OpenAI's Text-to-Speech API.

opencv openai openai-api openai-chatgpt openai-tts openai-vision

Updated
Python

eli6 / web-voyager

Star

🤖 🌐 Personal AI assistant that browses the web independently

agent ai assistant-chat-bots openai-vision langgraph langgraph-python

Updated
HTML

alxlvgit / ai-expense-tracker

Star

The AI Expense Tracker is an expense management tool designed to simplify the process of tracking expenses by automating the extraction of necessary data from receipt images.

typescript nextjs aws-s3 postgresql openai-vision

Updated
TypeScript

AyatKhraisat / Retail-Chatbot-with-Azure-OpenAI-LLmaIndex-and-Weviate

Star

azure openai streamlit openai-chatgpt llma-index openai-vision

Updated
Python

sumitsahoo / screenshot-to-testcase

Sponsor

Star

Generate K6 test case from a web page screenshot

automation openai k6 testcase-generator llm openai-vision

Updated
Python

Kimosabey / vision-pulse

Star

Multimodal Video Reasoning and Frame Analysis Platform for real-time intelligent monitoring.

distributed-systems computer-vision cognitive-systems video-analytics video-analysis vision-models openai-vision agentic-ai multimodal-ai kimo-2026-roadmap senior-engineer-portfolio senior-engineer-signal

Updated

aniketbhoy / document-topic-mapping

Star

An intelligent document navigation system that extracts topics, maps relationships, detects anomalies, and creates visual navigation tools for complex documents using LangGraph and OpenAI.

ocr graph mapping orchestration anomaly-detection navigation-architecture openai-vision langgraph

Updated
Python

mrunmayimahajan12 / hoo-hacks-screen-reader

Star

Smart Screen Reader- A Screen reader chrome extension for visually-impaired people. This is a Chrome extension that enhances web accessibility by: Generating image descriptions using AI, Summarizing entire pages, Allowing users to ask questions and automatically scroll to the relevant sections.

chrome-extension text-to-speech accessibility web-speech-api image-descriptions openai-api manifest-v3 conversational-assistant visually-impaired-people openai-vision ask-a-question voice-based-webpage-scrolling

Updated
JavaScript

lucereal / MenuParser

Star

google-cloud openai asp-net-core google-custom-search-api openai-vision azure-document-intelligence

Updated
C#

ketwong / groceries_io

Star

Web-based AI image recognition app.

python flask swagger image-processing webapp openai image-recognition openai-vision

Updated
Python

Kimosabey / vision-query

Star

Enterprise vision-query Technical Architecture focusing on Scalability and High Performance.

distributed-systems ffmpeg scalability artificial-intelligence openai cognitive-systems video-search multimodal vision-ai backend-engineering openai-vision agentic-ai kimo-2026-roadmap senior-engineer-portfolio

Updated

NO-Product / frontend-support

Star

AI-powered cross-browser UI/UX testing tool that captures screenshots across devices and uses OpenAI vision models to analyze layout, responsiveness, and compatibility issues.

python test-automation screenshot-testing responsive-design visual-testing ui-testing browser-automation visual-regression frontend-testing cli-tool web-testing cross-browser-testing accessibility-testing playwright playwright-python ai-testing gpt-vision openai-vision ux-analysis ui-analysis

Updated
Python

sanatren / Azazel

Star

Supercharge Legacy models with RAG, image analysis (GPT-4o vision), code execution, and web search. Transform it into a lean, multimodal powerhouse—smarter, faster, and more versatile, without the heavyweight cost.

openai whisper faiss huggingface-transformers supabase langchain-python openai-vision rag-memory

Updated
Python

motorcykle / openaivisiontest

Star

Just testing Open AI Vision

javascript typescript ai nextjs openai imagerecognition tailwindcss openai-api nextjs13 shadcn-ui nextjs14 openai-vision

Updated
TypeScript

Improve this page

Add a description, image, and links to the openai-vision topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the openai-vision topic, visit your repo's landing page and select "manage topics."

Learn more

URL: https://github.com/topics/openai-vision

⇱ openai-vision · GitHub Topics · GitHub

openai-vision

Here are 21 public repositories matching this topic...

olliethedev / gpt4-vision-chrome-extension

nooqta / saiku

bsorrentino / PlantUML4iPad

vishwakneelamegam / openai-video-narrator

sumitsahoo / erd-to-ddl

taherfattahi / MetaWorld-VLA-openai-clip-vit

RaheesAhmed / video_description_generator

eli6 / web-voyager

alxlvgit / ai-expense-tracker

AyatKhraisat / Retail-Chatbot-with-Azure-OpenAI-LLmaIndex-and-Weviate

sumitsahoo / screenshot-to-testcase

Kimosabey / vision-pulse

aniketbhoy / document-topic-mapping

mrunmayimahajan12 / hoo-hacks-screen-reader

lucereal / MenuParser

ketwong / groceries_io

Kimosabey / vision-query

NO-Product / frontend-support

sanatren / Azazel

motorcykle / openaivisiontest

Improve this page

Add this topic to your repo