KB Intelligence

A local documentation intelligence system for Zendesk Help Centers, built with Claude and Ollama.

It harvests your Help Center content and search data from Zendesk, builds a semantic knowledge graph over it, and gives you a full editorial intelligence layer: content gaps, stale articles, duplicate detection, LLM-powered label taxonomy, and a Claude Desktop integration via MCP.

Why This Exists

Most documentation teams — especially small ones — are flying blind. Your Help Center might have hundreds of articles. You know roughly what's in it. But you don't know:

What your users are actually searching for that you haven't written yet
Which articles are stale and generating support tickets because they're out of date
Which articles are near-duplicates of each other and should be consolidated
Whether your label taxonomy (if you have one) is consistent or riddled with model hallucinations
Which high-traffic articles have negative votes — meaning they exist, users find them, but they don't help

For a solo technical writer or a small team, answering any of these questions manually requires hours of spreadsheet work — if you do it at all. This system answers all of them automatically, continuously, and surfaces them in a dashboard built for editorial decision-making.

Especially useful for small teams

Large documentation teams have dedicated tooling, content strategists, and analytics staff. Small teams don't. A single technical writer supporting hundreds of articles at a growing SaaS company is the exact user this was built for.

The system runs entirely on your machine. There's no SaaS subscription, no data leaves your network, and the only ongoing cost is the Zendesk API calls you were already making. The local LLM stack (Ollama + llama3.1:8b) does the heavy lifting on label suggestions and gap analysis without sending your KB content to a cloud API.

The Claude integration layers on top: once the local database is built, Claude can reason over your entire Help Center with full semantic search, gap detection, and graph traversal — giving you a conversational interface to your own documentation.

Related MCP server: Claude RAG MCP Pipeline

What It Does

Pipeline (Python scripts, run once or on a schedule)

Import — Pulls article HTML, metadata, and engagement data from Zendesk via the REST API and Explore CSV exports
Embed — Generates semantic embeddings for every article and search query using Ollama (nomic-embed-text)
Graph — Extracts inbound/outbound links, transclusions, and section relationships to build a knowledge graph
Label — Uses a local LLM to suggest labels for every article based on your approved taxonomy. Corrections and few-shot examples are injected from data/corrections.json to keep suggestions consistent
Gaps — Cross-references zero-click search queries against article embeddings to surface content your users searched for but didn't find

MCP Server (Claude Desktop integration)

mcp_server.py exposes the local database as an MCP server that Claude Desktop can connect to. Tools available to Claude:

Tool	What it does
`search_articles`	Semantic search over the full article corpus
`get_article`	Retrieve a single article by ID with full metadata
`find_similar_articles`	Find articles semantically similar to a given one
`label_coverage`	Show which articles lack approved labels
`search_query_gaps`	Surface search queries with no good article matches
`stale_articles`	List articles by staleness signal (age, votes, ticket rate)
`article_graph`	Traverse the knowledge graph for a given article
`apply_labels`	Write approved labels back to Zendesk
`resolve_article_id`	Resolve a title to a numeric article ID

With this running, you can ask Claude things like:

"Which articles about SSO have the most support tickets after viewing?"
"Find me all articles that link to the getting started guide but don't have an onboarding label."
"What are the most common searches with no article result?"
"Which articles about billing haven't been updated in over a year and have negative votes?"

Dashboard (React + FastAPI)

A panel-based editorial dashboard for reviewing the pipeline output without leaving the browser:

Panel	Purpose
Overview	Article counts, embedding coverage, label coverage at a glance
Label Taxonomy	Browse and review your full label schema
Content Gaps	Search queries with poor or no article coverage
Article Health	Articles ranked by staleness, votes, and ticket rate
Explorer	Interactive knowledge graph: click any article, see its connections
Staleness	Articles sorted by last-updated date with engagement signals
Label Workflow	Review LLM label suggestions article by article, approve or reject
Search	Semantic search with similarity scores and metadata
Consolidation	Clusters of near-duplicate articles to consider merging
Maintenance	Database refresh controls and pipeline status

Tech Stack

Component	Technology
Database	SQLite + sqlite-vec for vector search
Embeddings	Ollama (`nomic-embed-text`, 768-dim)
LLM	Ollama (`llama3.1:8b`) for local label suggestion
Claude integration	MCP Python SDK
API	FastAPI + uvicorn
Dashboard	React 19 + Vite + TanStack Query + Tailwind 4 + D3
Tests	pytest (pipeline + API) + Vitest + Playwright (dashboard)

Everything runs locally. No cloud AI APIs required for the pipeline.

Setup

Prerequisites

Python 3.11+
Node.js 20+
Ollama running locally
A Zendesk account with Help Center enabled and API access
Claude Desktop (for the MCP integration)

1. Clone and install

git clone https://github.com/your-github-username/zendesk-kb-intelligence.git
cd zendesk-kb-intelligence

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cd dashboard && npm install && cd ..

2. Configure

Copy the example config and fill in your Zendesk credentials:

cp config.example.json config.json

Edit config.json:

{
 "zendesk": {
 "subdomain": "your-subdomain",
 "email": "admin@your-company.com",
 "api_token": "your_api_token_here",
 "locale": "en-us",
 "help_center_domain": "your-subdomain.zendesk.com"
 },
 "database": { "path": "kb.db" },
 "ollama": {
 "host": "http://localhost:11434",
 "embed_model": "nomic-embed-text",
 "eval_model": "llama3.1:8b"
 }
}

Configure the dashboard:

cp dashboard/.env.example dashboard/.env
# Edit dashboard/.env with your Zendesk subdomain

3. Pull Ollama models

ollama pull nomic-embed-text
ollama pull llama3.1:8b

4. Run the pipeline

# Initialize the database
python init_db.py
python init_vec.py

# Pull content from Zendesk
python harvest.py

# Load engagement data (export CSVs from Zendesk Explore first — see below)
python load_csvs.py

# Build embeddings
python embed_articles.py
python embed_queries.py

# Derive label schema from your app structure
python derive_label_schema.py
# Review and approve labels in data/label_schema.json, then:

# Run label suggestions (loops until all articles are processed)
bash run_until_done.sh

# Detect content gaps
python gap_detection.py

# Build the knowledge graph
python extract_links.py
python resolve_transclusions.py

5. Connect Claude Desktop

Add the MCP server to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
 "mcpServers": {
 "kb-intelligence": {
 "command": "/path/to/.venv/bin/python",
 "args": ["/path/to/zendesk-kb-intelligence/mcp_server.py"],
 "env": {
 "DB_PATH": "/path/to/kb.db"
 }
 }
 }
}

6. Start the dashboard

# Terminal 1 — API
source .venv/bin/activate
uvicorn api.main:app --reload --port 8765

# Terminal 2 — Dashboard
cd dashboard && npm run dev

Open http://localhost:5173.

Zendesk Explore CSV Exports

load_csvs.py expects these reports exported from Zendesk Explore:

Report	Expected filename pattern
Article engagement (views, votes)	`article_engagement*.csv`
Deflection drilldown	`deflection_drilldown*.csv`
Quick answers	`quick_answers*.csv`
Search queries overview	`search_queries_overview*.csv`
Search clicks	`search_clicks*.csv`
Search no results	`search_no_results*.csv`

Export these to data/exports/ (gitignored).

Label Schema

The label system has two layers:

Layer 1 — Architectural: Derived from your product's navigation structure. These reflect how your product is organized (tabs, sections, feature areas). Defined in LAYER_1_LABELS in derive_label_schema.py — edit this to match your product.

Layer 2 — Intent: Derived by clustering your actual search query embeddings. These reflect how your users talk about your product — often different from how your documentation is organized.

The two layers together form a vocabulary that satisfies both content organization and search discoverability.

To customize for your product:

Edit LAYER_1_LABELS in derive_label_schema.py with your top-level navigation
Run python derive_label_schema.py
Review data/label_schema.json — set status, label, and description for Layer 2 candidates
Set reviewed_by and reviewed_at in label_schema.json
Commit label_schema.json and run the embedding + suggestion pipeline

Running Tests

# Python pipeline tests
source .venv/bin/activate
pytest tests/ -v

# API tests (requires a populated kb.db)
pytest api/tests/ -v

# Dashboard unit tests
cd dashboard && npm run test:run

# Dashboard e2e tests (requires running dev server)
cd dashboard && npm run test:e2e

Project Structure

.
├── schema.sql # SQLite schema
├── init_db.py # DB initialization
├── harvest.py # Zendesk API → DB
├── load_csvs.py # Zendesk Explore CSVs → DB
├── embed_articles.py # Article embeddings (Ollama)
├── embed_queries.py # Search query embeddings
├── derive_label_schema.py # Label taxonomy derivation
├── suggest_labels.py # LLM label suggestions
├── gap_detection.py # Content gap detection
├── extract_links.py # Knowledge graph: links
├── resolve_transclusions.py # Knowledge graph: transclusions
├── apply_labels.py # Write labels back to Zendesk
├── mcp_server.py # Claude Desktop MCP integration
├── refresh.py # Incremental sync (in progress)
├── config.py # Config loader
├── config.example.json # Config template
├── .env.example # Environment variable template
├── data/
│ ├── label_schema.json # Approved label taxonomy
│ └── corrections.json # LLM correction rules
├── api/ # FastAPI backend
│ ├── main.py
│ ├── routers/
│ └── tests/
├── dashboard/ # React dashboard
│ ├── src/
│ ├── .env.example
│ └── .env.test
└── tests/ # Pipeline test suite

Adapting for Your Help Center

This repo is a working template, not an abstract framework. It was built for a real Help Center and then generalized. To make it yours:

Label schema — The biggest customization. Edit LAYER_1_LABELS in derive_label_schema.py to reflect your product's navigation. Everything else flows from the schema you define.
Few-shot examples — suggest_labels.py contains FEW_SHOT_EXAMPLES: real articles with correct labels that teach the LLM how to apply your taxonomy. Replace or extend these with examples from your own KB for best results.
Corrections — As you run the pipeline, you'll find systematic errors (the LLM applies a label too broadly, invents compound labels, etc.). Log them in data/corrections.json. The pipeline injects these as rules on every run.
Model — The pipeline uses llama3.1:8b by default. If you have a larger machine, swap in llama3.1:70b or any other Ollama-supported model in config.json.

License

MIT

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JoshWrites/zendesk-kb-intelligence'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

URL: https://glama.ai/mcp/servers/JoshWrites/zendesk-kb-intelligence

⇱ kb-intelligence by JoshWrites | Glama