VOOZH about

URL: https://huggingface.co/EphAsad/Atem-SageMaths-1.5B

⇱ EphAsad/Atem-SageMaths-1.5B Β· Hugging Face


πŸ‘ Atem Logo

Atem-SageMaths

Ancient logic. Modern intelligence. Applied to mathematics.

A 1.5B mathematics model trained on competition-grade problems β€” covering algebra, geometry, number theory, combinatorics, and more.

πŸ‘ Base Model
πŸ‘ Stage
πŸ‘ Parameters
πŸ‘ License


Overview

Atem-SageMaths is a mathematics-specialised variant of Atem-Wisdom-1.5B, fine-tuned on a curated corpus of competition and curriculum mathematics problems drawn from three complementary sources. It inherits Atem-Wisdom's reasoning capability and applies it to mathematical problem solving β€” working through problems step by step before arriving at a final answer.

The training corpus spans the full difficulty range from pre-algebra through competition-level mathematics, with structured solution traces from MATH-500 providing the think-then-answer format on harder problems.

When to choose Atem-SageMaths over Atem-Wisdom:

  • Mathematical problem solving across a broad difficulty range
  • Competition mathematics β€” algebra, geometry, number theory, combinatorics, probability
  • Situations where step-by-step mathematical working is required
  • Curriculum mathematics tasks at pre-algebra through precalculus level

When to choose Atem-Wisdom instead:

  • General reasoning, analytical, and logic tasks outside mathematics
  • Mixed-domain workloads where maths is one of many task types
  • Coding and algorithm tasks

The Atem Series

Model Stage Capability Status
Atem v1 Stage 1 β€” SFT Fast, direct reasoning βœ…
Atem-Wisdom Stage 2 β€” CoT Explicit thinking traces βœ…
Atem-SageCoder Specialisation β€” Code Think-then-code on algorithms βœ…
Atem-SageMaths Specialisation β€” Maths Structured mathematical problem solving βœ…
Atem-Pharaoh (planned) Stage 3 β€” DPO/IPO Preference-aligned reasoning πŸ”„

Atem-SageMaths is a domain-specialised branch off Atem-Wisdom, parallel to Atem-SageCoder. Both share the same base and training framework; only the dataset domain differs.


Model Details

Property Value
Base model EphAsad/Atem-Wisdom-1.5B
Root architecture Qwen/Qwen2.5-1.5B-Instruct
Training method LoRA SFT β€” Mathematics Specialisation
LoRA config r=32, alpha=64, dropout=0.05
Parameters ~1.54B
Training records 16,940 (after filtering)
Think / no-think split ~3% / ~97%
Epochs 2
Total steps 266
Final train loss 0.5602
Final val loss 0.6044
Hardware NVIDIA A100-SXM4 80GB
Max sequence length 8,192 tokens
Precision bfloat16
License Apache 2.0

Output Format

Atem-SageMaths produces responses in one of two formats:

With reasoning trace (MATH-500 derived examples):

<think>
[Step-by-step mathematical working β€” problem analysis, intermediate
calculations, verification of results]
</think>

[Final answer]

Direct answer (majority of responses):

[Structured solution with working shown inline]

The think-then-answer format activates on harder problems where the training data included explicit reasoning traces. The majority of training data (97%) used direct solution format without a separate think block, reflecting the natural composition of the training corpus.


Training Data

Atem-SageMaths was trained on 16,940 examples assembled from three sources after token length filtering. camel-ai/math was initially included but removed due to streaming instability during data loading.

Dataset Records CoT Domain
HuggingFaceH4/MATH-500 500 βœ… solutionβ†’think, answerβ†’final Competition maths, full curriculum
EleutherAI/hendrycks_math ~10,500 (1,500 Γ— 7 subsets) ❌ Algebra, geometry, number theory, counting & probability, intermediate algebra, prealgebra, precalculus
qwedsacf/competition_math 10,000 ❌ Competition mathematics

MATH-500 handling: The solution field contains full worked solutions and was used as the CoT trace (placed between <think> tags). The answer field (boxed final answer) was used as the response. This is the only source of think-trace training examples.

hendrycks_math subsets loaded: algebra, counting_and_probability, geometry, intermediate_algebra, number_theory, prealgebra, precalculus β€” 1,500 records from the train split of each.

Token length filter: Examples exceeding 8,192 tokens after chat template application were removed rather than truncated. This is the primary source of attrition from the ~31,000 raw records loaded to 16,940 retained.

Loss curve:

Step Train Loss Val Loss
100 0.6201 0.6310
200 0.5362 0.6077
266 (final) 0.5602 0.6044

Loss values are substantially lower than Atem-SageCoder (0.604 vs 0.859), consistent with mathematics Q&A being more formulaic and predictable than code+CoT reasoning traces. Train/val gap of ~0.04 throughout β€” no overfitting signal. Sharp improvement from steps 100β†’200 followed by stabilisation.


Training Configuration

# Key hyperparameters
lora_r = 32
lora_alpha = 64
lora_dropout = 0.05
max_seq_length = 8192
learning_rate = 1e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 8 # auto-scaled by Unsloth from config 4;
 # shorter math examples left VRAM headroom
grad_accumulation = 16 # effective batch size: 128
num_epochs = 2
dtype = bfloat16
load_in_4bit = True
nothink_ratio = 0.97 # reflects natural dataset composition

Training used Unsloth (unsloth==2026.5.5, unsloth_zoo==2026.5.5) with train_on_responses_only masking. Loss was computed exclusively on assistant response tokens. Pre-training validation confirmed identity, think tag format, and mask correctness before training was confirmed.


Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-SageMaths-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
 model_name,
 torch_dtype=torch.bfloat16,
 device_map="auto"
)

messages = [
 {
 "role": "user",
 "content": "Find all integer solutions to xΒ² - 7x + 12 = 0."
 }
]

inputs = tokenizer.apply_chat_template(
 messages,
 tokenize=True,
 add_generation_prompt=True,
 return_tensors="pt"
).to(model.device)

with torch.no_grad():
 output = model.generate(
 input_ids=inputs,
 max_new_tokens=1024,
 temperature=0.7,
 top_p=0.9,
 repetition_penalty=1.1,
 do_sample=True,
 )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
 model_name="EphAsad/Atem-SageMaths-1.5B",
 max_seq_length=8192,
 dtype=torch.bfloat16,
 load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
 {
 "role": "user",
 "content": "In how many ways can 8 people be seated around a circular table?"
 }
]

inputs = tokenizer.apply_chat_template(
 messages,
 tokenize=True,
 add_generation_prompt=True,
 return_tensors="pt"
).to("cuda")

with torch.no_grad():
 output = model.generate(
 input_ids=inputs,
 max_new_tokens=1024,
 temperature=0.7,
 top_p=0.9,
 do_sample=True,
 )

print(tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True))

Ollama

# Recommended β€” best speed/quality balance
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-SageMaths-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-SageMaths-1.5B:Q4_K_M

System Prompt

Atem-SageMaths's identity and mathematical focus are baked into the chat template. To override manually:

You are Atem-SageMaths, a precise mathematical reasoning assistant built
on the Atem foundation. You approach problems methodically β€” working through
each step carefully, verifying intermediate results, and arriving at
well-supported solutions. Your answers are rigorous, clearly structured,
and mathematically correct.

Available Files

File Size Description
model.safetensors ~3.1 GB Full bfloat16 merged weights
Atem-SageMaths-1.5B.Q4_K_M.gguf ~986 MB 4-bit quantised β€” recommended
Atem-SageMaths-1.5B.Q5_K_M.gguf ~1.1 GB 5-bit quantised
Atem-SageMaths-1.5B.Q8_0.gguf ~1.6 GB 8-bit quantised β€” near-lossless

Known Limitations

Thin CoT coverage. Only MATH-500 (500 records, ~3% of training data) contributes explicit think traces. The model's think-then-answer behaviour is therefore limited to problems structurally similar to MATH-500 content. On novel problem types, it defaults to the direct solution format.

Arithmetic precision. Multi-step numerical calculations remain a known weakness across the Atem series. Intermediate arithmetic slips can occur on problems requiring many sequential operations. Final answers to precision-critical calculations should be independently verified.

Competition scope. Training data is heavily weighted toward competition-style problems with clean closed-form answers. Open-ended applied mathematics, calculus, linear algebra, and statistics are not represented in the training corpus.


Roadmap

Stage Status Description
Stage 1 β€” SFT βœ… Complete Atem v1 β€” direct reasoning foundation
Stage 2 β€” CoT SFT βœ… Complete Atem-Wisdom β€” thinking traces
Specialisation β€” Code βœ… Complete Atem-SageCoder
Specialisation β€” Maths βœ… Complete Atem-SageMaths β€” this model
Stage 3 β€” DPO/IPO πŸ”„ Planned Atem-Pharaoh β€” preference-aligned reasoning

Citation

@misc{atem_sagemaths_2026,
 author = {Asad, Zain},
 title = {Atem-SageMaths: A 1.5B Mathematics Model
 via Competition Problem Distillation},
 year = {2026},
 publisher = {HuggingFace},
 howpublished = {\url{https://huggingface.co/EphAsad/Atem-SageMaths-1.5B}},
}

License

Released under the Apache 2.0 License, consistent with the base model chain (Qwen2.5-1.5B-Instruct β†’ Atem v1 β†’ Atem-Wisdom β†’ Atem-SageMaths).


Built independently by EphAsad

Downloads last month
213
Safetensors
Model size
2B params
Tensor type
BF16
Β·

Model tree for EphAsad/Atem-SageMaths-1.5B

Adapter
(4)
this model
Adapters
2 models

Datasets used to train EphAsad/Atem-SageMaths-1.5B