Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories.

Core Memory Operations

Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories
Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion
Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories
Count and list memories with filtering, sorting, and pagination for auditing and exploration
Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets

Memory Relationships & Graph

Create relationships between memories (relates_to, supersedes, caused_by, contradicts)
Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation
Delete edges between memories by ID, memory connection, or specific pairs
Auto-infer relationships using embedding similarity with optional LLM refinement

Validation & Quality

Validate memories by recording application success/failure to adjust confidence scores automatically
Detect contradictions between memories using semantic search and LLM reasoning
Check for superseding memories based on validation history to identify outdated information
Analyze memory health to detect contradictions, low-confidence, and stale memories
View validation history showing applied/succeeded/failed events and confidence score evolution

Performance & Monitoring

Check daemon status to monitor the async embedding service for fast storage (<10ms)
Track file activity to record file access events (read, write, edit) and view recent activity statistics

Key Features

Namespace isolation (global vs project-scoped)
Importance scoring (0.0-1.0) for memory prioritization
Confidence-based promotion to "golden rule" status (auto-promoted at 0.9)
Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)

Recall

Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.

Features

Persistent Memory Storage: Store preferences, decisions, patterns, and session context
Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)
Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
Namespace Isolation: Global memories vs project-scoped memories
Context Generation: Auto-format memories for session context injection
Deduplication: Content-hash based duplicate detection

Installation

# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall

# Install with uv
uv sync

# On Apple Silicon: MLX embeddings work automatically (fastest option)
# On other platforms or as fallback: ensure Ollama is running
ollama pull mxbai-embed-large # Required if not using MLX
ollama pull llama3.2 # Optional: session summarization for auto-capture hook
ollama serve

Usage

Run as MCP Server

uv run python -m recall

CLI Options

uv run python -m recall --help

Options:
 --sqlite-path PATH SQLite database path (default: ~/.recall/recall.db)
 --chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db)
 --collection NAME ChromaDB collection name (default: memories)
 --ollama-host HOST Ollama server URL (default: http://localhost:11434)
 --ollama-model MODEL Embedding model (default: mxbai-embed-large)
 --ollama-timeout SECS Request timeout (default: 30)
 --log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)

meta-mcp Configuration

Add Recall to your meta-mcp servers.json:

{
 "recall": {
 "command": "uv",
 "args": [
 "run",
 "--directory",
 "/path/to/recall",
 "python",
 "-m",
 "recall"
 ],
 "env": {
 "RECALL_LOG_LEVEL": "INFO",
 "RECALL_OLLAMA_HOST": "http://localhost:11434",
 "RECALL_OLLAMA_MODEL": "mxbai-embed-large"
 },
 "description": "Long-term memory system with semantic search",
 "tags": ["memory", "context", "semantic-search"]
 }
}

Or for Claude Code / other MCP clients (claude.json):

{
 "mcpServers": {
 "recall": {
 "command": "uv",
 "args": [
 "run",
 "--directory",
 "/path/to/recall",
 "python",
 "-m",
 "recall"
 ],
 "env": {
 "RECALL_LOG_LEVEL": "INFO"
 }
 }
 }
}

Environment Variables

Variable	Default	Description
`RECALL_SQLITE_PATH`	`~/.recall/recall.db`	SQLite database file path
`RECALL_CHROMA_PATH`	`~/.recall/chroma_db`	ChromaDB persistent storage directory
`RECALL_COLLECTION_NAME`	`memories`	ChromaDB collection name
`RECALL_EMBEDDING_BACKEND`	`ollama`	Embedding backend: `mlx` (Apple Silicon) or `ollama`
`RECALL_MLX_MODEL`	`mlx-community/mxbai-embed-large-v1`	MLX embedding model identifier
`RECALL_OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`RECALL_OLLAMA_MODEL`	`mxbai-embed-large`	Ollama embedding model name
`RECALL_OLLAMA_TIMEOUT`	`30`	Ollama request timeout in seconds
`RECALL_LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
`RECALL_DEFAULT_NAMESPACE`	`global`	Default namespace for memories
`RECALL_DEFAULT_IMPORTANCE`	`0.5`	Default importance score (0.0-1.0)
`RECALL_DEFAULT_TOKEN_BUDGET`	`4000`	Default token budget for context

MCP Tool Examples

memory_store_tool

Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.

{
 "content": "User prefers dark mode in all applications",
 "memory_type": "preference",
 "namespace": "global",
 "importance": 0.8,
 "metadata": {"source": "explicit_request"}
}

Response (fast path via daemon):

{
 "success": true,
 "queued": true,
 "queue_id": 42,
 "namespace": "global"
}

Response (sync path fallback):

{
 "success": true,
 "queued": false,
 "id": "550e8400-e29b-41d4-a716-446655440000",
 "content_hash": "a1b2c3d4e5f67890"
}

daemon_status_tool

Check if the recall daemon is running:

{}

Response:

{
 "running": true,
 "status": {
 "pid": 12345,
 "store_queue": {"pending_count": 5},
 "embed_worker_running": true
 }
}

memory_recall_tool

Search memories by semantic similarity:

{
 "query": "user interface preferences",
 "n_results": 5,
 "namespace": "global",
 "memory_type": "preference",
 "min_importance": 0.5,
 "include_related": true
}

Response:

{
 "success": true,
 "memories": [
 {
 "id": "550e8400-e29b-41d4-a716-446655440000",
 "content": "User prefers dark mode in all applications",
 "type": "preference",
 "namespace": "global",
 "importance": 0.8,
 "created_at": "2024-01-15T10:30:00",
 "accessed_at": "2024-01-15T14:22:00",
 "access_count": 3
 }
 ],
 "total": 1,
 "score": 0.92
}

memory_relate_tool

Create a relationship between memories:

{
 "source_id": "mem_new_123",
 "target_id": "mem_old_456",
 "relation": "supersedes",
 "weight": 1.0
}

Response:

{
 "success": true,
 "edge_id": 42
}

memory_context_tool

Generate formatted context for session injection:

{
 "query": "coding style preferences",
 "project": "myproject",
 "token_budget": 4000
}

Response:

{
 "success": true,
 "context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
 "token_estimate": 125
}

memory_forget_tool

Delete memories by ID or semantic search:

{
 "memory_id": "550e8400-e29b-41d4-a716-446655440000",
 "confirm": true
}

Or delete by search:

{
 "query": "outdated preferences",
 "namespace": "project:oldproject",
 "n_results": 10,
 "confirm": true
}

Response:

{
 "success": true,
 "deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
 "deleted_count": 1
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastMCP) │
│ memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
 │
 ┌─────────────┴─────────────┐
 │ │
 ┌─────────▼─────────┐ ┌─────────▼─────────┐
 │ FAST PATH │ │ SYNC PATH │
 │ <10ms │ │ MLX: <100ms │
 └─────────┬─────────┘ │ Ollama: 10-60s │
 │ └─────────┬─────────┘
 ┌─────────▼─────────┐ │
 │ recall-daemon │ ┌─────────▼─────────┐
 │ (Unix socket) │ │ HybridStore │
 │ │ └─────────┬─────────┘
 │ ┌─────────────┐ │ │
 │ │ StoreQueue │ │ ┌───────────┼───────────┐
 │ │ EmbedWorker │ │ │ │ │
 │ └─────────────┘ │ │ │ │
 └─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
 │ │SQLite │ │Chroma │ │ Embedding │
 └─────────────►Store │ │ Store │ │ Factory │
 └───────┘ └───────┘ └─────┬─────┘
 │
 ┌───────────┴───────────┐
 │ │
 ┌─────▼─────┐ ┌─────▼─────┐
 │ MLX │ │ Ollama │
 │ (Apple) │ │ (Fallback)│
 └───────────┘ └───────────┘

The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).

Daemon Setup (macOS)

The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.

Quick Install

# From the recall directory
./hooks/install-daemon.sh

This will:

Copy hook scripts to ~/.claude/hooks/
Install the launchd plist to ~/Library/LaunchAgents/
Start the daemon automatically

Manual Install

# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py

# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs

# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
 hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist

# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

Daemon Commands

# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq

# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist

# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.log

Hooks Configuration

Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/

# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html

# Type checking
uv run mypy src/recall

# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -v

Requirements

Python 3.13+
For Apple Silicon (recommended): MLX embeddings work automatically with mlx-embeddings package
For other platforms: Ollama with:
- mxbai-embed-large model (required for semantic search)
- llama3.2 model (optional, for session auto-capture hook)
~500MB disk space for ChromaDB indices

License

MIT

Install Server

license - not found

quality

maintenance - not tested

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Tools

View all tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

URL: https://glama.ai/mcp/servers/blueman82/recall

⇱ Recall by blueman82 | Glama

Recall

Features

Installation

Usage

Run as MCP Server

CLI Options

meta-mcp Configuration

Environment Variables

MCP Tool Examples

memory_store_tool

daemon_status_tool

memory_recall_tool

memory_relate_tool

memory_context_tool

memory_forget_tool

Architecture

Daemon Setup (macOS)

Quick Install

Manual Install

Daemon Commands

Hooks Configuration

Development

Requirements

License

Resources

Tools

Latest Blog Posts

MCP directory API