VOOZH about

URL: https://glama.ai/mcp/servers/CoderDayton/verifiable-thinking-mcp

⇱ verifiable-thinking-mcp by CoderDayton | Glama


Your LLM is confidently wrong 40% of the time on reasoning questions. This fixes that.

πŸ‘ npm version
πŸ‘ CI
πŸ‘ codecov
πŸ‘ License: MIT

15 trap patterns detected in <1ms. No LLM calls. Just pattern matching.

Quick Start β€’ Features β€’ Trap Detection β€’ API


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ "A bat and ball cost $1.10. The bat costs $1 more..." β”‚
β”‚ ↓ β”‚
β”‚ TRAP DETECTED: additive_system β”‚
β”‚ > Don't subtract $1 from $1.10. Set up: x + (x+1) = 1.10 β”‚
β”‚ ↓ β”‚
β”‚ Answer: $0.05 (not $0.10) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

npx -y verifiable-thinking-mcp

Add to Claude Desktop (claude_desktop_config.json):

{
 "mcpServers": {
 "verifiable-thinking": {
 "command": "npx",
 "args": ["-y", "verifiable-thinking-mcp"]
 }
 }
}

Features

🎯 Trap Detection

15 patterns (bat-ball, Monty Hall, base rate) caught before reasoning starts

βš”οΈ Auto-Challenge

Forces counterarguments when confidence >95%β€”no more overconfident wrong answers

πŸ” Contradiction Detection

Catches "Let x=5" then "Now x=10" across steps

🌿 Hypothesis Branching

Explore alternatives, auto-detects when branches confirm/refute

πŸ”’ Local Math

Evaluates expressions without LLM round-trips

πŸ—œοΈ Smart Compression

49% token savings with telegraphic + sentence-level compression

⚑ Real Token Counting

Tiktoken integrationβ€”3,922Γ— cache speedup, zero estimation error

Token Efficiency

Every operation counts. Verifiable Thinking uses real token counting (tiktoken) and intelligent compression to cut costs by 50-60% without sacrificing reasoning quality.

// Traditional reasoning: ~1,350 tokens for 10-step chain
// Verifiable Thinking: ~580 tokens (49–57% savings)

// Real token counting (not estimation)
countTokens("What is 2+2?") // β†’ 7 tokens (not 3)
// Cache speedup: 3,922Γ— faster on repeated strings

// Compress before processing (not just storage)
scratchpad({
 operation: "step",
 thought: "Long analysis...", // 135 tokens β†’ 72 tokens
 compress: true
})

// Budget controls
scratchpad({
 warn_at_tokens: 2000, // Soft warning
 hard_limit_tokens: 5000 // Hard stop
})

At scale: 1,000 reasoning chains/day = $4,193/year saved (at GPT-4o pricing).

See docs/token-optimization.md for architecture details and benchmarks.

How It Works

// Start with a questionβ€”trap detection runs automatically
scratchpad({
 operation: "step",
 question: "A bat and ball cost $1.10...",
 thought: "Let ball = x, bat = x + 1.00",
 confidence: 0.9
})
// β†’ Returns trap_analysis warning

// High confidence? Auto-challenge kicks in
scratchpad({ operation: "step", thought: "...", confidence: 0.96 })
// β†’ Returns challenge_suggestion: "What if your assumption is wrong?"

// Complete with spot-check
scratchpad({ operation: "complete", final_answer: "$0.05" })

Trap Detection

Pattern

What It Catches

additive_system

Bat-ball, widget-gadget (subtract instead of solve)

nonlinear_growth

Lily pad doubling (linear interpolation)

monty_hall

Door switching (50/50 fallacy)

base_rate

Medical tests (ignoring prevalence)

independence

Coin flips (gambler's fallacy)

Pattern

Trap

additive_system

Subtract instead of solve

nonlinear_growth

Linear interpolation

rate_pattern

Incorrect scaling

harmonic_mean

Arithmetic mean for rates

independence

Gambler's fallacy

pigeonhole

Underestimate worst case

base_rate

Ignore prevalence

factorial_counting

Simple division

clock_overlap

Assume 12 overlaps

conditional_probability

Ignore conditioning

conjunction_fallacy

More detail = more likely

monty_hall

50/50 after reveal

anchoring

Irrelevant number influence

sunk_cost

Past investment bias

framing_effect

Gain/loss framing

Tools

scratchpad β€” the main tool with 11 operations:

Operation

What It Does

step

Add reasoning step (trap priming on first)

complete

Finalize with auto spot-check

revise

Fix earlier step

branch

Explore alternative path

challenge

Force adversarial self-check

navigate

View history/branches

Operation

Purpose

step

Add reasoning step

complete

Finalize chain

revise

Fix earlier step

branch

Alternative path

challenge

Adversarial self-check

navigate

View history

spot_check

Manual trap check

hint

Progressive simplification

mistakes

Algebraic error detection

augment

Compute math expressions

override

Force-commit failed step

Other tools: list_sessions, get_session, clear_session, compress

vs Sequential Thinking MCP

Sequential Thinking

Verifiable Thinking

Trap detection

❌

15 patterns

Auto-challenge

❌

>95% confidence

Contradiction detection

❌

βœ…

Confidence tracking

❌

Per-step + chain

Local compute

❌

βœ…

Token budgets

❌

Soft + hard limits

Real token counting

❌

Tiktoken (3,922Γ— cache speedup)

Compression

❌

49–57% token savings

Sequential Thinking is ~100 lines. This is 22,000+ with 1,967 tests.

See docs/competitive-analysis.md for full breakdown.

Development

git clone https://github.com/CoderDayton/verifiable-thinking-mcp.git
cd verifiable-thinking-mcp && bun install
bun run dev # Interactive MCP Inspector
bun test # 1,967 tests

License

MIT


Report Bug Β· Request Feature

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CoderDayton/verifiable-thinking-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server