Atem-SageMaths
Ancient logic. Modern intelligence. Applied to mathematics.
A 1.5B mathematics model trained on competition-grade problems β covering algebra, geometry, number theory, combinatorics, and more.
π Base Model
π Stage
π Parameters
π License
Overview
Atem-SageMaths is a mathematics-specialised variant of Atem-Wisdom-1.5B, fine-tuned on a curated corpus of competition and curriculum mathematics problems drawn from three complementary sources. It inherits Atem-Wisdom's reasoning capability and applies it to mathematical problem solving β working through problems step by step before arriving at a final answer.
The training corpus spans the full difficulty range from pre-algebra through competition-level mathematics, with structured solution traces from MATH-500 providing the think-then-answer format on harder problems.
When to choose Atem-SageMaths over Atem-Wisdom:
- Mathematical problem solving across a broad difficulty range
- Competition mathematics β algebra, geometry, number theory, combinatorics, probability
- Situations where step-by-step mathematical working is required
- Curriculum mathematics tasks at pre-algebra through precalculus level
When to choose Atem-Wisdom instead:
- General reasoning, analytical, and logic tasks outside mathematics
- Mixed-domain workloads where maths is one of many task types
- Coding and algorithm tasks
The Atem Series
| Model | Stage | Capability | Status |
|---|---|---|---|
| Atem v1 | Stage 1 β SFT | Fast, direct reasoning | β |
| Atem-Wisdom | Stage 2 β CoT | Explicit thinking traces | β |
| Atem-SageCoder | Specialisation β Code | Think-then-code on algorithms | β |
| Atem-SageMaths | Specialisation β Maths | Structured mathematical problem solving | β |
| Atem-Pharaoh (planned) | Stage 3 β DPO/IPO | Preference-aligned reasoning | π |
Atem-SageMaths is a domain-specialised branch off Atem-Wisdom, parallel to Atem-SageCoder. Both share the same base and training framework; only the dataset domain differs.
Model Details
| Property | Value |
|---|---|
| Base model | EphAsad/Atem-Wisdom-1.5B |
| Root architecture | Qwen/Qwen2.5-1.5B-Instruct |
| Training method | LoRA SFT β Mathematics Specialisation |
| LoRA config | r=32, alpha=64, dropout=0.05 |
| Parameters | ~1.54B |
| Training records | 16,940 (after filtering) |
| Think / no-think split | ~3% / ~97% |
| Epochs | 2 |
| Total steps | 266 |
| Final train loss | 0.5602 |
| Final val loss | 0.6044 |
| Hardware | NVIDIA A100-SXM4 80GB |
| Max sequence length | 8,192 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
Output Format
Atem-SageMaths produces responses in one of two formats:
With reasoning trace (MATH-500 derived examples):
<think>
[Step-by-step mathematical working β problem analysis, intermediate
calculations, verification of results]
</think>
[Final answer]
Direct answer (majority of responses):
[Structured solution with working shown inline]
The think-then-answer format activates on harder problems where the training data included explicit reasoning traces. The majority of training data (97%) used direct solution format without a separate think block, reflecting the natural composition of the training corpus.
Training Data
Atem-SageMaths was trained on 16,940 examples assembled from three sources after token length filtering. camel-ai/math was initially included but removed due to streaming instability during data loading.
| Dataset | Records | CoT | Domain |
|---|---|---|---|
| HuggingFaceH4/MATH-500 | 500 | β solutionβthink, answerβfinal | Competition maths, full curriculum |
| EleutherAI/hendrycks_math | ~10,500 (1,500 Γ 7 subsets) | β | Algebra, geometry, number theory, counting & probability, intermediate algebra, prealgebra, precalculus |
| qwedsacf/competition_math | 10,000 | β | Competition mathematics |
MATH-500 handling: The solution field contains full worked solutions and was used as the CoT trace (placed between <think> tags). The answer field (boxed final answer) was used as the response. This is the only source of think-trace training examples.
hendrycks_math subsets loaded: algebra, counting_and_probability, geometry, intermediate_algebra, number_theory, prealgebra, precalculus β 1,500 records from the train split of each.
Token length filter: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated. This is the primary source of attrition from the ~31,000 raw records loaded to 16,940 retained.
Loss curve:
| Step | Train Loss | Val Loss |
|---|---|---|
| 100 | 0.6201 | 0.6310 |
| 200 | 0.5362 | 0.6077 |
| 266 (final) | 0.5602 | 0.6044 |
Loss values are substantially lower than Atem-SageCoder (0.604 vs 0.859), consistent with mathematics Q&A being more formulaic and predictable than code+CoT reasoning traces. Train/val gap of ~0.04 throughout β no overfitting signal. Sharp improvement from steps 100β200 followed by stabilisation.
Training Configuration
# Key hyperparameters
lora_r = 32
lora_alpha = 64
lora_dropout = 0.05
max_seq_length = 8192
learning_rate = 1e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 8 # auto-scaled by Unsloth from config 4;
# shorter math examples left VRAM headroom
grad_accumulation = 16 # effective batch size: 128
num_epochs = 2
dtype = bfloat16
load_in_4bit = True
nothink_ratio = 0.97 # reflects natural dataset composition
Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. Pre-training validation confirmed identity, think tag format, and mask correctness before training was confirmed.
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Atem-SageMaths-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "Find all integer solutions to xΒ² - 7x + 12 = 0."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Atem-SageMaths-1.5B",
max_seq_length=8192,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": "In how many ways can 8 people be seated around a circular table?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))
Ollama
# Recommended β best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q8_0
llama.cpp
llama-server -hf EphAsad/Atem-SageMaths-1.5B:Q4_K_M
System Prompt
Atem-SageMaths's identity and mathematical focus are baked into the chat template. To override manually:
You are Atem-SageMaths, a precise mathematical reasoning assistant built
on the Atem foundation. You approach problems methodically β working through
each step carefully, verifying intermediate results, and arriving at
well-supported solutions. Your answers are rigorous, clearly structured,
and mathematically correct.
Available Files
| File | Size | Description |
|---|---|---|
model.safetensors |
~3.1 GB | Full bfloat16 merged weights |
Atem-SageMaths-1.5B.Q4_K_M.gguf |
~986 MB | 4-bit quantised β recommended |
Atem-SageMaths-1.5B.Q5_K_M.gguf |
~1.1 GB | 5-bit quantised |
Atem-SageMaths-1.5B.Q8_0.gguf |
~1.6 GB | 8-bit quantised β near-lossless |
Known Limitations
Thin CoT coverage. Only MATH-500 (500 records, ~3% of training data) contributes explicit think traces. The model's think-then-answer behaviour is therefore limited to problems structurally similar to MATH-500 content. On novel problem types, it defaults to the direct solution format.
Arithmetic precision. Multi-step numerical calculations remain a known weakness across the Atem series. Intermediate arithmetic slips can occur on problems requiring many sequential operations. Final answers to precision-critical calculations should be independently verified.
Competition scope. Training data is heavily weighted toward competition-style problems with clean closed-form answers. Open-ended applied mathematics, calculus, linear algebra, and statistics are not represented in the training corpus.
Roadmap
| Stage | Status | Description |
|---|---|---|
| Stage 1 β SFT | β Complete | Atem v1 β direct reasoning foundation |
| Stage 2 β CoT SFT | β Complete | Atem-Wisdom β thinking traces |
| Specialisation β Code | β Complete | Atem-SageCoder |
| Specialisation β Maths | β Complete | Atem-SageMaths β this model |
| Stage 3 β DPO/IPO | π Planned | Atem-Pharaoh β preference-aligned reasoning |
Citation
@misc{atem_sagemaths_2026,
author = {Asad, Zain},
title = {Atem-SageMaths: A 1.5B Mathematics Model
via Competition Problem Distillation},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageMaths-1.5B}},
}
License
Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct β Atem v1 β Atem-Wisdom β Atem-SageMaths).
Built independently by EphAsad
- Downloads last month
- 213
Model tree for EphAsad/Atem-SageMaths-1.5B
Base model
Qwen/Qwen2.5-1.5B