[Showcase] Running Gemma-4-26B-A4B-it on 8GB RAM Smartphone
#43
by InfiniteVoid - opened
Day 3 of building a custom Vulkan external-MoE inference path on top of llama.cpp/GGUF.
Gemma 4 26B A4B Q4_0 running locally on a phone.
Phone: Poco X4 GT / Xiaomi 22041216G / xaga, MT6895, Android 14, 8 GB RAM.
Specs:
context: 512
thinking: on, budget 16
prompt: 0.81 tok/s
generation: 0.38 tok/s
test run: 412 generated tokens
Fully local, no cloud, running through llama.cpp
