Voozh

Your LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.

👁 npm version
👁 CI
👁 codecov
👁 License: MIT

15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.

Quick Start • Features • Trap Detection • API

┌────────────────────────────────────────────────────────────────┐
│ "A bat and ball cost $1.10. The bat costs $1 more..." │
│ ↓ │
│ TRAP DETECTED: additive_system │
│ > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10 │
│ ↓ │
│ Answer: $0.05 (not $0.10) │
└────────────────────────────────────────────────────────────────┘

Quick Start

npx -y verifiable-thinking-mcp

Add to Claude Desktop (claude_desktop_config.json):

{
 "mcpServers": {
 "verifiable-thinking": {
 "command": "npx",
 "args": ["-y", "verifiable-thinking-mcp"]
 }
 }
}

Features

🎯 Trap Detection	15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts
⚔️ Auto-Challenge	Forces counterarguments when confidence >95%—no more overconfident wrong answers
🔍 Contradiction Detection	Catches "Let x=5" then "Now x=10" across steps
🌿 Hypothesis Branching	Explore alternatives, auto-detects when branches confirm/refute
🔢 Local Math	Evaluates expressions without LLM round-trips
🗜️ Smart Compression	49% token savings with telegraphic + sentence-level compression
⚡ Real Token Counting	Tiktoken integration—3,922× cache speedup, zero estimation error

Token Efficiency

Every operation counts. Verifiable Thinking uses real token counting (tiktoken) and intelligent compression to cut costs by 50-60% without sacrificing reasoning quality.

// Traditional reasoning: ~1,350 tokens for 10-step chain
// Verifiable Thinking: ~580 tokens (49–57% savings)

// Real token counting (not estimation)
countTokens("What is 2+2?") // → 7 tokens (not 3)
// Cache speedup: 3,922× faster on repeated strings

// Compress before processing (not just storage)
scratchpad({
 operation: "step",
 thought: "Long analysis...", // 135 tokens → 72 tokens
 compress: true
})

// Budget controls
scratchpad({
 warn_at_tokens: 2000, // Soft warning
 hard_limit_tokens: 5000 // Hard stop
})

At scale: 1,000 reasoning chains/day = $4,193/year saved (at GPT-4o pricing).

See docs/token-optimization.md for architecture details and benchmarks.

How It Works

// Start with a question—trap detection runs automatically
scratchpad({
 operation: "step",
 question: "A bat and ball cost $1.10...",
 thought: "Let ball = x, bat = x + 1.00",
 confidence: 0.9
})
// → Returns trap_analysis warning

// High confidence? Auto-challenge kicks in
scratchpad({ operation: "step", thought: "...", confidence: 0.96 })
// → Returns challenge_suggestion: "What if your assumption is wrong?"

// Complete with spot-check
scratchpad({ operation: "complete", final_answer: "$0.05" })

Trap Detection

Pattern	What It Catches
`additive_system`	Bat-ball, widget-gadget (subtract instead of solve)
`nonlinear_growth`	Lily pad doubling (linear interpolation)
`monty_hall`	Door switching (50/50 fallacy)
`base_rate`	Medical tests (ignoring prevalence)
`independence`	Coin flips (gambler's fallacy)

Pattern	Trap
`additive_system`	Subtract instead of solve
`nonlinear_growth`	Linear interpolation
`rate_pattern`	Incorrect scaling
`harmonic_mean`	Arithmetic mean for rates
`independence`	Gambler's fallacy
`pigeonhole`	Underestimate worst case
`base_rate`	Ignore prevalence
`factorial_counting`	Simple division
`clock_overlap`	Assume 12 overlaps
`conditional_probability`	Ignore conditioning
`conjunction_fallacy`	More detail = more likely
`monty_hall`	50/50 after reveal
`anchoring`	Irrelevant number influence
`sunk_cost`	Past investment bias
`framing_effect`	Gain/loss framing

Tools

scratchpad — the main tool with 11 operations:

Operation	What It Does
`step`	Add reasoning step (trap priming on first)
`complete`	Finalize with auto spot-check
`revise`	Fix earlier step
`branch`	Explore alternative path
`challenge`	Force adversarial self-check
`navigate`	View history/branches

Operation	Purpose
`step`	Add reasoning step
`complete`	Finalize chain
`revise`	Fix earlier step
`branch`	Alternative path
`challenge`	Adversarial self-check
`navigate`	View history
`spot_check`	Manual trap check
`hint`	Progressive simplification
`mistakes`	Algebraic error detection
`augment`	Compute math expressions
`override`	Force-commit failed step

Other tools: list_sessions, get_session, clear_session, compress

vs Sequential Thinking MCP

Sequential Thinking	Verifiable Thinking
Trap detection	❌	15 patterns
Auto-challenge	❌	>95% confidence
Contradiction detection	❌	✅
Confidence tracking	❌	Per-step + chain
Local compute	❌	✅
Token budgets	❌	Soft + hard limits
Real token counting	❌	Tiktoken (3,922× cache speedup)
Compression	❌	49–57% token savings

Sequential Thinking is ~100 lines. This is 22,000+ with 1,967 tests.

See docs/competitive-analysis.md for full breakdown.

Development

git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git
cd verifiable-thinking-mcp && bun install
bun run dev # Interactive MCP Inspector
bun test # 1,967 tests

License

MIT

Report Bug · Request Feature

Install Server

license - permissive license

quality

maintenance - not tested

How are these scores calculated?

Resources

Need Help?

Related Servers

Tools

Appeared in Searches

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CoderDayton/verifiable-thinking-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

URL: https://glama.ai/mcp/servers/CoderDayton/verifiable-thinking-mcp

⇱ verifiable-thinking-mcp by CoderDayton | Glama

Quick Start

Features

Token Efficiency

How It Works

Trap Detection

Tools

vs Sequential Thinking MCP

Development

License

Resources

Tools

Appeared in Searches

Latest Blog Posts

MCP directory API