The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
- Python
![]() |
VOOZH | about |
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
oneAPI Data Analytics Library (oneDAL)
The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
Cross-platform c++ sdk & model hub for cross-platform AI inference. Ready-to-deploy models including Segment Anything 3, Depth Anything 2 and Gemma.
High-Performance AI-Native Web Server β built in C & Assembly for ultra-fast AI inference and streaming.
A distributed system for Agentic AI
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
MOTO - Autonomous ASI Deep Research Harness by Intrafere - creative novelty-seeking mathematics researcher for S.T.E.M. users, run for days at a time once pressing start - no interaction needed! MOTO uses simultaneous agents working in parallel from either local host LM studio, OpenRouter, or both. No internet required! Star us, more to come soon!
Client library to interact with various APIs used within Philips in a simple and uniform way
GPU-aware inference mesh for large-scale AI serving
Unity TTS plugin: Piper neural synthesis + pure C# G2P (6 languages: ja/en/zh/es/fr/pt) + Unity Inference Engine. Windows/Mac/Linux/Android/iOS/WebGL ready. High-quality voices for games & apps.
Hands-on course materials for deploying and optimizing generative AI on Arm processors: Raspberry Pi, AWS Graviton, SIMD, quantization (educational)
A framework for performing AI model inference on encrypted data.
Save 50% off GenAI costs in two lines of code
A development framework for Fully Homomorphic Encryption (FHE)
World's first L1 blockchain with deterministic on-chain AI inference verified through multi-node consensus. Bitwise identical outputs across every chip, every architecture.
Customed version of Google's tflite-micro
KaiROS AIβ Intelligence, Precisely When It Matters.
A powerful, faster, scalable full-stack boilerplace for AI inference using Node.js, Python, Redis, and Docker
Add a description, image, and links to the ai-inference topic page so that developers can more easily learn about it.
To associate your repository with the ai-inference topic, visit your repo's landing page and select "manage topics."