ai-inference

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.

python machine-learning analytics libraries models pipelines data-flow asyncio flow-based-programming datasets dag frameworks event-based dataflows ai-training hyperautomation dffml ai-inference swrepo ai-machine-learning

Updated
Python

blace-ai / blace-ai

Star

Cross-platform c++ sdk & model hub for cross-platform AI inference. Ready-to-deploy models including Segment Anything 3, Depth Anything 2 and Gemma.

cpp ai-sdk libtorch onnxruntime ai-inference ai-hub

Updated
C++

okba14 / NeuroHTTP

Star

High-Performance AI-Native Web Server — built in C & Assembly for ultra-fast AI inference and streaming.

open-source web-server high-performance assembly grpc low-latency neural http3 c-language grpc-server ai-inference ai-native ai-inference-server

Updated
C

xflops / flame

Star

A distributed system for Agentic AI

python rust distributed-systems ai ai-inference aiagent agentic-ai agntic-rl

Updated
Rust

willbnu / Qwen-3.5-16G-Vram-Local

Star

Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU

benchmark cuda nvidia moe vlm multimodal mixture-of-experts huggingface ai-inference llm llama-cpp local-llm local-ai qwen gguf llama-server rtx-5080 rtx-4080 qwen3-5 vram-16gb

Updated
Python

Intrafere / MOTO-Autonomous-ASI

Star

MOTO - Autonomous ASI Deep Research Harness by Intrafere - creative novelty-seeking mathematics researcher for S.T.E.M. users, run for days at a time once pressing start - no interaction needed! MOTO uses simultaneous agents working in parallel from either local host LM studio, OpenRouter, or both. No internet required! Star us, more to come soon!

python mathematics multi-agent autonomous theoretical-physics autonomous-agents ai-research ai-inference large-language-models autonomous-ai deep-research research-harness

Updated
Python

philips-software / go-hsdp-api

Star

Client library to interact with various APIs used within Philips in a simple and uniform way

iot auditing logging iam pki fhir cartel cdr fhir-client cdl ironio tdr hsdp hsdp-api ai-training ai-inference ai-workspace

Updated
Go

redbco / infermesh

Star

GPU-aware inference mesh for large-scale AI serving

rust distributed-systems fault-tolerance high-availability service-mesh observability inference-engine model-serving ml-infrastructure ai-inference gpu-inference ai-infrastructure gpu-mesh

Updated
Rust

ayutaz / uPiper

Sponsor

Star

Unity TTS plugin: Piper neural synthesis + pure C# G2P (6 languages: ja/en/zh/es/fr/pt) + Unity Inference Engine. Windows/Mac/Linux/Android/iOS/WebGL ready. High-quality voices for games & apps.

text-to-speech unity tts unity-plugin ai-inference

Updated
C#

arm-education / AI-on-Arm

Star

Hands-on course materials for deploying and optimizing generative AI on Arm processors: Raspberry Pi, AWS Graviton, SIMD, quantization (educational)

raspberry-pi rpi machine-learning arm optimization neon jupyter-notebook pytorch labs cloud-computing edge-computing sve onnx-runtime ai-inference generative-ai genai aws-graviton

Updated
Jupyter Notebook

cipherflow-fhe / latti-ai

Star

A framework for performing AI model inference on encrypted data.

cryptography privacy deep-learning torch homomorphic-encryption fhe fully-homomorphic-encryption ai-inference

Updated
C++

vienneraphael / batchling

Star

Save 50% off GenAI costs in two lines of code

python api async python-library gemini batch openai batch-processing mistral openai-api doubleword ai-inference llm generative-ai anthropic llm-inference togetherai anthropic-api batchling request-batching

Updated
Python

cipherflow-fhe / lattisense

Star

A development framework for Fully Homomorphic Encryption (FHE)

machine-learning cryptography cuda data-analysis homomorphic-encryption privacy-preserving fhe secure-multiparty-computation confidential-computing ai-inference

Updated
C++

FerrumVir / arc-chain

Star

World's first L1 blockchain with deterministic on-chain AI inference verified through multi-node consensus. Bitwise identical outputs across every chip, every architecture.