gguf-model-support

Here are 11 public repositories matching this topic...

brontoguana / krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

transformer inference-engine inference-optimization mixture-of-experts cpu-inference large-language-models gpu-inference llm-inference high-performance-inference hybrid-inference gguf-model-support llama-cpp-alternative

Updated
C++

nareshis21 / Truelarge-RT

Star

Android inference engine running 20B+ parameter LLMs on 4GB-8GB RAM devices. Features proprietary Layer-by-Layer (LBL) streaming, zero-copy mmap loading, and native C++/Kotlin architecture.

android cpp kotlin-android jni layered-architecture inference-engine on-device-ai edgeai llm low-ram-usage llamacpp llm-inference gguf-model-support

Updated
Kotlin

Mainframework / Quanta

Star

Convert and quantize llm models

ai artificial-intelligence quantization top100 awesome-ai top10 quanta llm safetensors bestofthebest gguf bestofai awesome-ai-tools top-ai-tools gguf-models gguf-quantization best-ai-software gguf-model-support best-apps gguf-editor

Updated
Python

mamei16 / MADLAD-400-WebUI

Star

A simple Gradio app for local translation using the GGUF versions of MADLAD-400

nlp translation machine-translation gradio llamacpp gguf gguf-model-support

Updated
Python

Privacy-first Local RAG Server: Chat with PDF & DOCX using GGUF models via llama.cpp and Qdrant. A lightweight, standalone FastAPI server with a clean HTML UI. High-performance, fully offline document intelligence. No Ollama, no cloud, no API keys.

python document-search rag fastapi qdrant llm llama-cpp local-llm gguf offline-ai rag-pipeline rag-chatbot gguf-model-support

Updated
Python

splinterhq / libsplinter

Star

Splinter is an atomic, lock-free persist-able shared memory KV & vector store that runs LLM inference without socket, mutex or memcpy() overhead; it ingests, stores and optionally persists huge amounts of data without latency. Splinter fits in the size of most modern CPU instruction caches (766 ELOC) and ships with CLI , tools and tests.

lua signal-processing bloom-filter inference pubsub atomic-design epoll lock-free kv vectors gdelt-data atomics seqlock vector-search vector-database physics-informed-neural-networks llama-cpp gguf-model-support

Updated
C

KienPC1234 / Emotica-AI

Star

Emotica AI is a compassionate and therapeutic virtual assistant designed to provide empathetic and supportive conversations. It integrates a local LLaMA model for text generation, a vision model for image captioning, a RAG system for information retrieval, and emotion detection to tailor its responses.

python model chatbot cuda embeddings lang blip-model gguf-model-support

Updated
Python

frinknet / gelli

Star

Containerized LLM for any use-case big or small

llama llm llmops llamacpp ggml llm-training gguf-model-support

Updated
Shell

👁 Nectar-X-Studio

headlessripper / Nectar-X-Studio

Star

Nectar-X-Studio is a powerful, Local AI-Inferencing application that allows the user download, create, run agents and run large language models on their own machine. With no internet connection required, Nectar ensures privacy-first, high-performance inference using cutting-edge open-source models from Hugging Face, Ollama, and beyond.

ai ml ai-agents infrence stable-diffusion gguf-model-support