Slonik-7B-GRPO — GGUF
GGUF quantizations of Phani-labs/Slonik-7B-GRPO, built to run locally through llama.cpp, Ollama, LM Studio, Jan, and other GGUF-compatible runtimes.
Why I built this
I wanted a small text-to-SQL model that could run locally but still handle real PostgreSQL and SQLite questions. Most strong SQL models today are either much larger, cloud-only, or awkward to integrate into local workflows. This project was an experiment to see how far a 7B coding model could go with focused supervised fine-tuning followed by execution-based reinforcement learning.
The surprising part: on the BIRD-PG eval, the 7B model came out ahead of GPT-4o while still being small enough to run on a laptop.
Results
Numbers from the BIRD Mini-Dev 500-example dev set, evaluated against the BIRD PostgreSQL dump loaded into local Postgres + pgvector.
| Model | BIRD-PG | BIRD-SQLite | Size |
|---|---|---|---|
| o3-mini | 47.78% | — | reasoning |
| Claude 3.7 Sonnet | 39.26% | — | proprietary |
| Slonik-7B-GRPO (this) | 38.20% | 45.20% | 7B |
| GPT-4o | 34.44% | — | proprietary |
| Qwen2.5-Coder-32B | 22.96% | — | 32B |
| Codestral 22B | 21.11% | — | 22B |
| Qwen2.5-Coder-7B (base) | 12.22% | — | 7B |
By difficulty
| Tier | BIRD-PG | BIRD-SQLite |
|---|---|---|
| Simple | 56.1% | 66.2% |
| Moderate | 33.6% | 38.0% |
| Challenging | 23.5% | 32.4% |
Available quantizations
| File | Quant | Size | Notes |
|---|---|---|---|
Slonik-7B-GRPO.Q4_K_M.gguf |
Q4_K_M | 4.4 GB | Best quality-to-size tradeoff. Runs on 8 GB VRAM or CPU. |
Slonik-7B-GRPO.Q5_K_M.gguf |
Q5_K_M | 5.1 GB | Slightly better quality if you have the memory. Runs on 8 GB VRAM. |
Slonik-7B-GRPO.Q8_0.gguf |
Q8_0 | 7.6 GB | Near-lossless. Best if you have 12 GB VRAM or enough system RAM. |
Most people should start with Q4_K_M. It's the easiest to run and gives the best quality-to-size balance. Use Q5_K_M if you have memory to spare, or Q8_0 if you want results closest to the original model.
Usage
Ollama
ollama pull hf.co/Phani-labs/Slonik-7B-GRPO-GGUF:Q4_K_M
ollama run hf.co/Phani-labs/Slonik-7B-GRPO-GGUF:Q4_K_M
If Ollama has trouble picking the template automatically, use the prompt format shown below.
llama.cpp
./llama-cli -m Slonik-7B-GRPO.Q4_K_M.gguf -p "<|im_start|>user
Schema:
CREATE TABLE orders (id INT, customer_id INT, total NUMERIC, order_date DATE);
CREATE TABLE customers (id INT, name TEXT, country TEXT);
Question: Total revenue by country in 2024, top 5.<|im_end|>
<|im_start|>assistant
" -n 200 --temp 0
LM Studio / Jan
Download Slonik-7B-GRPO.Q4_K_M.gguf, drop it into your models folder, and load it from your local runtime.
Prompt format
Uses the Qwen2.5 chat template (<|im_start|> / <|im_end|>):
<|im_start|>user
Schema:
<your CREATE TABLE statements here>
Question: <your question>
### Hint:
<optional clarifications about column meanings, date formats, join paths>
<|im_end|>
<|im_start|>assistant
The ### Hint: block is optional but helps a lot for non-obvious schemas. Example:
Schema:
CREATE TABLE orders (id INT, customer_id INT, total NUMERIC, order_date DATE);
CREATE TABLE customers (id INT, name TEXT, country TEXT);
Question: Total revenue by country in 2024, top 5.
### Hint:
Join orders.customer_id = customers.id. Revenue is the sum of orders.total.
Training
Two stages, both on a single RTX 5080 Laptop GPU (16 GB VRAM).
Stage 1 — QLoRA SFT (8h 13min)
Standard supervised fine-tuning on 21,847 text-to-SQL pairs:
- BIRD train split — 6,601 examples (PostgreSQL/SQLite, expert-curated)
- Spider — 8,034 examples (SQLite, classic benchmark)
- Gretel synthetic text-to-SQL — 5,212 PostgreSQL examples (synthetic, large coverage)
- Custom PG-Modern synth — 2,000 examples generated via DeepSeek-V4, covering pgvector, JSONB, window functions, fulltext search, CTEs, and array operations
LoRA rank 32, alpha 64, 4-bit NF4 base. LR 1e-5, cosine schedule, max_grad_norm 0.5, adamw_torch_fused (the 8-bit Adam variant caused NaN with bf16 on Blackwell). Final eval_loss 0.290.
Stage 2 — GRPO with execution rewards (16h)
GRPO (Group Relative Policy Optimization) with three reward signals: weighted execution match against the BIRD SQLite databases (1.0), syntax validity via sqlglot (0.2), and code-fence formatting (0.1). 2000 steps, num_generations=2.
The total external cost was about $3 (DeepSeek API for the PG-Modern synthesis). Everything else ran locally.
What GRPO actually fixed
The biggest improvement was dialect awareness. SFT kept generating MONTH(date) — that's MySQL syntax and just fails on Postgres. GRPO learned EXTRACT(MONTH FROM date) from the executions that came back as errors.
It also got better at date formats. SFT was guessing patterns like LIKE '%/%/87%' (assuming mm/dd/yy), which returned empty result sets. GRPO settled on LIKE '%1987%' after enough wrong-answer signals.
A smaller but interesting one: it learned when not to quote identifiers. SFT was over-quoting in cases where the DDL was unquoted, which broke case-sensitive matches.
Notes from training
A few things that helped more than I expected:
- Execution feedback was much more useful than format-only rewards. The dialect-specific improvements above only happened because the model could see what failed against a real database.
- PostgreSQL syntax errors gave the model a strong, unambiguous signal during GRPO.
- The hardest remaining failures are still schema-grounding mistakes, especially on tables with many columns or ambiguous join paths. That's a 7B-size limitation more than anything else.
Limitations
This is not a general SQL assistant for every dialect — it's tuned around PostgreSQL and SQLite specifically. Behavior on MySQL or SQL Server isn't validated.
The 7B size still shows up on harder examples. Challenging-tier BIRD-PG accuracy is 23.5%, and schema grounding is imperfect on tables with 30+ columns, where most remaining errors are hallucinated column names.
GRPO occasionally over-quotes identifiers or adds unnecessary DISTINCT. I saw 6 such regressions across 500 BIRD-PG examples. The net gain was still positive, but this is one weakness of binary execution rewards.
Author
Phani
- GitHub: slonik-7b
- Full-precision weights: Phani-labs/Slonik-7B-GRPO
- SFT-only baseline: Phani-labs/Slonik-7B-SFT
- Downloads last month
- 161
4-bit
5-bit
8-bit
