Atem-SageMaths

Ancient logic. Modern intelligence. Applied to mathematics.

A 1.5B mathematics model trained on competition-grade problems — covering algebra, geometry, number theory, combinatorics, and more.

👁 Base Model
👁 Stage
👁 Parameters
👁 License

Overview

Atem-SageMaths is a mathematics-specialised variant of Atem-Wisdom-1.5B, fine-tuned on a curated corpus of competition and curriculum mathematics problems drawn from three complementary sources. It inherits Atem-Wisdom's reasoning capability and applies it to mathematical problem solving — working through problems step by step before arriving at a final answer.

The training corpus spans the full difficulty range from pre-algebra through competition-level mathematics, with structured solution traces from MATH-500 providing the think-then-answer format on harder problems.

When to choose Atem-SageMaths over Atem-Wisdom:

Mathematical problem solving across a broad difficulty range
Competition mathematics — algebra, geometry, number theory, combinatorics, probability
Situations where step-by-step mathematical working is required
Curriculum mathematics tasks at pre-algebra through precalculus level

When to choose Atem-Wisdom instead:

General reasoning, analytical, and logic tasks outside mathematics
Mixed-domain workloads where maths is one of many task types
Coding and algorithm tasks

The Atem Series

Model	Stage	Capability	Status
Atem v1	Stage 1 — SFT	Fast, direct reasoning	✅
Atem-Wisdom	Stage 2 — CoT	Explicit thinking traces	✅
Atem-SageCoder	Specialisation — Code	Think-then-code on algorithms	✅
Atem-SageMaths	Specialisation — Maths	Structured mathematical problem solving	✅
Atem-Pharaoh (planned)	Stage 3 — DPO/IPO	Preference-aligned reasoning	🔄

Atem-SageMaths is a domain-specialised branch off Atem-Wisdom, parallel to Atem-SageCoder. Both share the same base and training framework; only the dataset domain differs.

Model Details

Property	Value
Base model	EphAsad/Atem-Wisdom-1.5B
Root architecture	Qwen/Qwen2.5-1.5B-Instruct
Training method	LoRA SFT — Mathematics Specialisation
LoRA config	r=32, alpha=64, dropout=0.05
Parameters	~1.54B
Training records	16,940 (after filtering)
Think / no-think split	~3% / ~97%
Epochs	2
Total steps	266
Final train loss	0.5602
Final val loss	0.6044
Hardware	NVIDIA A100-SXM4 80GB
Max sequence length	8,192 tokens
Precision	bfloat16
License	Apache 2.0

Output Format

Atem-SageMaths produces responses in one of two formats:

With reasoning trace (MATH-500 derived examples):

<think>
[Step-by-step mathematical working — problem analysis, intermediate
calculations, verification of results]
</think>

[Final answer]

Direct answer (majority of responses):

[Structured solution with working shown inline]

The think-then-answer format activates on harder problems where the training data included explicit reasoning traces. The majority of training data (97%) used direct solution format without a separate think block, reflecting the natural composition of the training corpus.

Training Data

Atem-SageMaths was trained on 16,940 examples assembled from three sources after token length filtering. camel-ai/math was initially included but removed due to streaming instability during data loading.

Dataset	Records	CoT	Domain
HuggingFaceH4/MATH-500	500	✅ solution→think, answer→final	Competition maths, full curriculum
EleutherAI/hendrycks_math	~10,500 (1,500 × 7 subsets)	❌	Algebra, geometry, number theory, counting & probability, intermediate algebra, prealgebra, precalculus
qwedsacf/competition_math	10,000	❌	Competition mathematics

MATH-500 handling: The solution field contains full worked solutions and was used as the CoT trace (placed between <think> tags). The answer field (boxed final answer) was used as the response. This is the only source of think-trace training examples.

hendrycks_math subsets loaded: algebra, counting_and_probability, geometry, intermediate_algebra, number_theory, prealgebra, precalculus — 1,500 records from the train split of each.

Token length filter: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated. This is the primary source of attrition from the ~31,000 raw records loaded to 16,940 retained.

Loss curve:

Step	Train Loss	Val Loss
100	0.6201	0.6310
200	0.5362	0.6077
266 (final)	0.5602	0.6044

Loss values are substantially lower than Atem-SageCoder (0.604 vs 0.859), consistent with mathematics Q&A being more formulaic and predictable than code+CoT reasoning traces. Train/val gap of ~0.04 throughout — no overfitting signal. Sharp improvement from steps 100→200 followed by stabilisation.

Training Configuration

# Key hyperparameters
lora_r = 32
lora_alpha = 64
lora_dropout = 0.05
max_seq_length = 8192
learning_rate = 1e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 8 # auto-scaled by Unsloth from config 4;
 # shorter math examples left VRAM headroom
grad_accumulation = 16 # effective batch size: 128
num_epochs = 2
dtype = bfloat16
load_in_4bit = True
nothink_ratio = 0.97 # reflects natural dataset composition

Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. Pre-training validation confirmed identity, think tag format, and mask correctness before training was confirmed.

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-SageMaths-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
 model_name,
 torch_dtype=torch.bfloat16,
 device_map="auto"
)

messages = [
 {
 "role": "user",
 "content": "Find all integer solutions to x² - 7x + 12 = 0."
 }
]

inputs = tokenizer.apply_chat_template(
 messages,
 tokenize=True,
 add_generation_prompt=True,
 return_tensors="pt"
).to(model.device)

with torch.no_grad():
 output = model.generate(
 input_ids=inputs,
 max_new_tokens=1024,
 temperature=0.7,
 top_p=0.9,
 repetition_penalty=1.1,
 do_sample=True,
 )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
 model_name="EphAsad/Atem-SageMaths-1.5B",
 max_seq_length=8192,
 dtype=torch.bfloat16,
 load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
 {
 "role": "user",
 "content": "In how many ways can 8 people be seated around a circular table?"
 }
]

inputs = tokenizer.apply_chat_template(
 messages,
 tokenize=True,
 add_generation_prompt=True,
 return_tensors="pt"
).to("cuda")

with torch.no_grad():
 output = model.generate(
 input_ids=inputs,
 max_new_tokens=1024,
 temperature=0.7,
 top_p=0.9,
 do_sample=True,
 )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-SageMaths-1.5B:Q4_K_M

System Prompt

Atem-SageMaths's identity and mathematical focus are baked into the chat template. To override manually:

You are Atem-SageMaths, a precise mathematical reasoning assistant built
on the Atem foundation. You approach problems methodically — working through
each step carefully, verifying intermediate results, and arriving at
well-supported solutions. Your answers are rigorous, clearly structured,
and mathematically correct.

Available Files

File	Size	Description
`model.safetensors`	~3.1 GB	Full bfloat16 merged weights
`Atem-SageMaths-1.5B.Q4_K_M.gguf`	~986 MB	4-bit quantised — recommended
`Atem-SageMaths-1.5B.Q5_K_M.gguf`	~1.1 GB	5-bit quantised
`Atem-SageMaths-1.5B.Q8_0.gguf`	~1.6 GB	8-bit quantised — near-lossless

Known Limitations

Thin CoT coverage. Only MATH-500 (500 records, ~3% of training data) contributes explicit think traces. The model's think-then-answer behaviour is therefore limited to problems structurally similar to MATH-500 content. On novel problem types, it defaults to the direct solution format.

Arithmetic precision. Multi-step numerical calculations remain a known weakness across the Atem series. Intermediate arithmetic slips can occur on problems requiring many sequential operations. Final answers to precision-critical calculations should be independently verified.

Competition scope. Training data is heavily weighted toward competition-style problems with clean closed-form answers. Open-ended applied mathematics, calculus, linear algebra, and statistics are not represented in the training corpus.

Roadmap

Stage	Status	Description
Stage 1 — SFT	✅ Complete	Atem v1 — direct reasoning foundation
Stage 2 — CoT SFT	✅ Complete	Atem-Wisdom — thinking traces
Specialisation — Code	✅ Complete	Atem-SageCoder
Specialisation — Maths	✅ Complete	Atem-SageMaths — this model
Stage 3 — DPO/IPO	🔄 Planned	Atem-Pharaoh — preference-aligned reasoning

Citation

@misc{atem_sagemaths_2026,
 author = {Asad, Zain},
 title = {Atem-SageMaths: A 1.5B Mathematics Model
 via Competition Problem Distillation},
 year = {2026},
 publisher = {HuggingFace},
 howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageMaths-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct → Atem v1 → Atem-Wisdom → Atem-SageMaths).

Built independently by EphAsad

Downloads last month: 213

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for EphAsad/Atem-SageMaths-1.5B

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

EphAsad/Atem-v1-1.5B

Adapter

EphAsad/Atem-Wisdom-1.5B

Adapter

(4)

this model

Adapters

2 models

URL: https://huggingface.co/EphAsad/Atem-SageMaths-1.5B

⇱ EphAsad/Atem-SageMaths-1.5B · Hugging Face

Atem-SageMaths

Overview

The Atem Series

Model Details

Output Format

Training Data

Training Configuration

Usage

Transformers

Unsloth (faster inference)

Ollama

llama.cpp

System Prompt

Available Files

Known Limitations

Roadmap

Citation

License

Model tree for EphAsad/Atem-SageMaths-1.5B

Datasets used to train EphAsad/Atem-SageMaths-1.5B