Voozh

[Showcase] Running Gemma-4-26B-A4B-it on 8GB RAM Smartphone

#43

by InfiniteVoid - opened May 18

May 18

•

Day 3 of building a custom Vulkan external-MoE inference path on top of llama.cpp/GGUF.

Gemma 4 26B A4B Q4_0 running locally on a phone.

Phone: Poco X4 GT / Xiaomi 22041216G / xaga, MT6895, Android 14, 8 GB RAM.

Specs:

context: 512
thinking: on, budget 16
prompt: 0.81 tok/s
generation: 0.38 tok/s
test run: 412 generated tokens

Fully local, no cloud, running through llama.cpp

· Sign up or log in to comment