Your LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.
π npm version
π CI
π codecov
π License: MIT
15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.
Quick Start β’ Features β’ Trap Detection β’ API
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β "A bat and ball cost $1.10. The bat costs $1 more..." β
β β β
β TRAP DETECTED: additive_system β
β > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10 β
β β β
β Answer: $0.05 (not $0.10) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββQuick Start
npx -y verifiable-thinking-mcpAdd to Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"verifiable-thinking": {
"command": "npx",
"args": ["-y", "verifiable-thinking-mcp"]
}
}
}Features
π― Trap Detection | 15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts |
βοΈ Auto-Challenge | Forces counterarguments when confidence >95%βno more overconfident wrong answers |
π Contradiction Detection | Catches "Let x=5" then "Now x=10" across steps |
πΏ Hypothesis Branching | Explore alternatives, auto-detects when branches confirm/refute |
π’ Local Math | Evaluates expressions without LLM round-trips |
ποΈ Smart Compression | 49% token savings with telegraphic + sentence-level compression |
β‘ Real Token Counting | Tiktoken integrationβ3,922Γ cache speedup, zero estimation error |
Token Efficiency
Every operation counts. Verifiable Thinking uses real token counting (tiktoken) and intelligent compression to cut costs by 50-60% without sacrificing reasoning quality.
// Traditional reasoning: ~1,350 tokens for 10-step chain
// Verifiable Thinking: ~580 tokens (49β57% savings)
// Real token counting (not estimation)
countTokens("What is 2+2?") // β 7 tokens (not 3)
// Cache speedup: 3,922Γ faster on repeated strings
// Compress before processing (not just storage)
scratchpad({
operation: "step",
thought: "Long analysis...", // 135 tokens β 72 tokens
compress: true
})
// Budget controls
scratchpad({
warn_at_tokens: 2000, // Soft warning
hard_limit_tokens: 5000 // Hard stop
})At scale: 1,000 reasoning chains/day = $4,193/year saved (at GPT-4o pricing).
See docs/token-optimization.md for architecture details and benchmarks.
How It Works
// Start with a questionβtrap detection runs automatically
scratchpad({
operation: "step",
question: "A bat and ball cost $1.10...",
thought: "Let ball = x, bat = x + 1.00",
confidence: 0.9
})
// β Returns trap_analysis warning
// High confidence? Auto-challenge kicks in
scratchpad({ operation: "step", thought: "...", confidence: 0.96 })
// β Returns challenge_suggestion: "What if your assumption is wrong?"
// Complete with spot-check
scratchpad({ operation: "complete", final_answer: "$0.05" })Trap Detection
Pattern | What It Catches |
| Bat-ball, widget-gadget (subtract instead of solve) |
| Lily pad doubling (linear interpolation) |
| Door switching (50/50 fallacy) |
| Medical tests (ignoring prevalence) |
| Coin flips (gambler's fallacy) |
Pattern | Trap |
| Subtract instead of solve |
| Linear interpolation |
| Incorrect scaling |
| Arithmetic mean for rates |
| Gambler's fallacy |
| Underestimate worst case |
| Ignore prevalence |
| Simple division |
| Assume 12 overlaps |
| Ignore conditioning |
| More detail = more likely |
| 50/50 after reveal |
| Irrelevant number influence |
| Past investment bias |
| Gain/loss framing |
Tools
scratchpad β the main tool with 11 operations:
Operation | What It Does |
| Add reasoning step (trap priming on first) |
| Finalize with auto spot-check |
| Fix earlier step |
| Explore alternative path |
| Force adversarial self-check |
| View history/branches |
Operation | Purpose |
| Add reasoning step |
| Finalize chain |
| Fix earlier step |
| Alternative path |
| Adversarial self-check |
| View history |
| Manual trap check |
| Progressive simplification |
| Algebraic error detection |
| Compute math expressions |
| Force-commit failed step |
Other tools: list_sessions, get_session, clear_session, compress
vs Sequential Thinking MCP
Sequential Thinking | Verifiable Thinking | |
Trap detection | β | 15 patterns |
Auto-challenge | β | >95% confidence |
Contradiction detection | β | β |
Confidence tracking | β | Per-step + chain |
Local compute | β | β |
Token budgets | β | Soft + hard limits |
Real token counting | β | Tiktoken (3,922Γ cache speedup) |
Compression | β | 49β57% token savings |
Sequential Thinking is ~100 lines. This is 22,000+ with 1,967 tests.
See docs/competitive-analysis.md for full breakdown.
Development
git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git
cd verifiable-thinking-mcp && bun install
bun run dev # Interactive MCP Inspector
bun test # 1,967 testsLicense
MIT
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/CoderDayton/verifiable-thinking-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
