👁 Image
README ¶

web-researcher-mcp

Your AI research assistant that cites real sources and stays honest.

Search the entire web or narrow it down to just the sites you trust;
medical journals, court databases, news outlets, academic papers.
Analyze the full source, not just snippets. Links that work, citations you can trust,
no made up closed garden pre-synthesized results.

👁 CI
👁 Go Report Card
👁 Go Reference
👁 Image
👁 Release
👁 Docker
👁 PyPI
👁 web-researcher-mcp MCP server
👁 GitHub Stars

Get started in 30 seconds

Python users — uvx (no compile, any OS):

# One-time: install uv (skip if you already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS/Linux (Windows: winget install astral-sh.uv)

claude mcp add --scope user web-researcher -- uvx web-researcher-mcp

uv fetches the right prebuilt binary for your platform and runs it — no Go, no compile, no manual PATH. Point any MCP client at uvx web-researcher-mcp. Also works with uv tool install web-researcher-mcp or pip install web-researcher-mcp.

Python SDK

from web_researcher_mcp import WebResearcherClient

async with WebResearcherClient() as client:
 response = await client.web_search("CRISPR off-target effects 2024", num_results=5)
 for r in response.results:
 verified = await client.verify_citation(r.url)
 print(r.title, "—", "✓" if verified.exists else "?")

Full documentation: docs/PYTHON_CLIENT.md

👁 Open In Colab

Sync wrapper (for scripts and notebooks that don't use async):

with WebResearcherClient.sync() as client:
 response = client.web_search("climate change 2024")
 print(response.results[0].title)

macOS (Homebrew):

brew install zoharbabin/tap/web-researcher-mcp
claude mcp add --scope user web-researcher -- web-researcher-mcp

macOS / Linux (no package manager):

curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.ps1 | iex"

No dev tools needed — every method ships the same signed binary (the PyPI wheels vendor it; the others download it and verify its checksum) and puts it on your PATH. The curl/PowerShell installers also register it with Claude Code automatically when the claude CLI is present; Homebrew installs the binary, so run the claude mcp add line above to connect it.

One-click install:

👁 Add to Cursor
👁 Install in VS Code
👁 Add to LM Studio

The Cursor / VS Code / LM Studio buttons install the zero-config uvx setup (your editor prompts to confirm before adding it; needs uv — see above). It runs DuckDuckGo web search with no API key — great to try instantly; image_search/news_search and richer providers need a key (2 min, see Configuration). Claude Desktop: download the .mcpb bundle for your platform and double-click it (Settings → Extensions), or use the uvx line above.

Using a different MCP client or want to pass API keys? See Connect to Your AI Assistant for the per-app config, and Configuration to pick a search provider.

Your AI can now search the web, read full articles, find academic papers, look up patents, and run multi-step research — only from sources you pick.

Why does this exist?

Perplexity gets its citations wrong over a third of the time. It links to papers that don't exist, invents DOIs, and presents SEO spam with the same confidence as peer-reviewed research. ChatGPT's web search isn't much better — it can't tell a blog post from a court filing.

If your work gets cited, published, submitted to a court, or shown to a client — you can't afford "probably real" sources.

This tool fixes the root cause: instead of searching the entire web and hoping, you tell your AI exactly which sources to search. We call these "search lenses" — curated lists of trusted sites for each field.

What you get	What that means for you
Search lenses — choose your sources by field	Your AI only sees the sites you trust (PubMed, SEC.gov, arXiv — not random blogs)
Research tools for every source type	Papers, patents, SEC filings, US court records, economic data, news, web pages, images, full-text reading, grounded answers with citations, structured extraction, and multi-step deep research
Always has a backup	Multiple search engines working together — if one has issues, the others pick up automatically
Reads full articles	Doesn't just give you snippets — extracts and reads entire pages, PDFs, Word docs, even YouTube transcripts and Hacker News threads
Real citations, formatted	Every source comes with a proper APA/MLA citation and a link that actually works
Your queries stay private	Runs on your machine — nobody sees what you're researching. Not us, not anyone.
Paper trail	Every search is logged so you can reproduce your research process months later

Works with Claude, Claude Desktop, Cursor, and any AI assistant that supports tool use.

Who uses this

Academic researchers — "I need a literature review with real DOIs, not made-up citations"
Business analysts — "My deliverable needs sources a client can actually click and verify"
Lawyers — "If I cite a case that doesn't exist, I get fined $50,000"
Journalists — "I need to cross-check government records and court filings, not Perplexity summaries"
Medical researchers — "Clinical decisions based on a health blog could hurt someone"
Graduate students — "I spent 3 hours tracking down a citation my AI invented"
Enterprise teams — "Our competitive research can't go through a third party's servers"

👁 Image

How It Compares

	web-researcher-mcp	Perplexity	Scite.ai	Elicit
You pick which sources are searched	Yes (built-in + custom lenses)	No	No	No
Makes up citations	Never — every link is real	~37% incorrect	Rare (journals only)	Rare
Works across all fields	Yes — legal, medical, news, patents, everything	Yes	Journals only	Papers only
Keeps your research private	Yes — runs on your machine	No (they see everything)	No	No
Works inside your existing AI (Claude, Cursor, etc.)	Yes	No (separate app)	Partially	No (separate app)
Can read full articles, not just snippets	Yes — pages, PDFs, Word docs, YouTube	No	No	Limited
Cost	Free forever (open source)	$20/mo	$20/mo	$10-49/mo

When to use what

Perplexity — Quick casual lookups where you don't need to cite your sources
Scite.ai / Elicit — Browsing a specific database of academic papers
web-researcher-mcp — Anything where your reputation is attached to the research: client work, court filings, publications, grant proposals, medical decisions, journalism
Claude built-in search — Quick one-off lookups mid-conversation

What your AI can do with this

Tool	What it does
`web_search`	Search the web — optionally restricted to only the sources you trust via lenses
`scrape_page`	Read any URL in full — web pages, PDFs, Word docs, slideshows, YouTube transcripts, Hacker News threads (read natively via the HN API); supports `mode: raw` for verbatim, unsanitized source (e.g. inspecting JSON or HTML)
`search_and_scrape`	Search and then read the best results — with quality scoring to surface the most reliable sources
`image_search`	Find images by size, type, color, or format
`news_search`	Search recent news with date controls and source filtering
`academic_search`	Find real papers with real DOIs — authors, citation counts, open-access links
`citation_graph`	Walk a paper's citation neighborhood — works it cites and works that cite it, with intent/influence signals
`patent_search`	Search patent offices (US, Europe, international) with classification codes
`filing_search`	Search SEC EDGAR for US public-company filings (10-K, 10-Q, 8-K, …) — or pull structured XBRL company facts
`legal_search`	Search US court opinions and dockets via CourtListener — real cases with real citations
`econ_search`	Look up economic data — World Bank global development indicators, OECD economic indicators, Eurostat European statistics (all keyless), and FRED US macro series (GDP, CPI, unemployment, rates; requires FRED_API_KEY)
`clinical_search`	Search ClinicalTrials.gov — clinical-trial registrations with status, phase, sponsor, and whether results are posted (discovery, not medical advice)
`verify_citation`	Check a citation before you rely on it — does it exist, match a real record, and is it retracted or a dead link? Evidence, not a verdict
`audit_bibliography`	Audit a whole reference list in one pass — paste a CSL-JSON/RIS/BibTeX file (or a session) and get per-entry + corpus-level flags for retracted, dead-link, and unverifiable citations
`verify_recommendation`	Audit an AI-generated recommendation list (listicle, product ranking) for self-promotion, author conflicts of interest, domain reputation, and dead links — catches GEO-gamed picks. Evidence, not a verdict
`archive_source`	Capture a fresh Internet Archive (Wayback Machine) snapshot of a URL via Save Page Now so a cited source stays verifiable if the page later changes or disappears — returns snapshot URL + timestamp (write tool)
`answer`	Ask a factual question and get one synthesized answer with citations — the direct answer, not a reading list
`structured_search`	Search and extract structured JSON per result (supply a schema), or pull entities by category (company, people, …)
`sequential_search`	Multi-step deep research — your AI remembers what it already found and builds on it
`get_research_session`	Recover a research session after context loss — picks up right where you left off
`research_export`	Export a research session as a shareable report (markdown or JSON), with full per-step provenance
`format_bibliography`	Turn collected sources into a formatted bibliography — APA, MLA, BibTeX, RIS, or CSL-JSON (Zotero/EndNote/Mendeley-ready)

Most tools above are always available. A few activate only when the right provider or config is present: citation_graph requires a citation-capable academic provider (OpenAlex or Semantic Scholar); filing_search requires EDGAR_CONTACT_EMAIL; answer and structured_search require a provider that supports those capabilities (e.g. Exa). Operators can also enable opt-in, consent-gated tools (per-user analytics, long-term memory, shared workspaces) that appear only when their feature is turned on — see docs/TOOLS.md for the authoritative, CI-verified tool list and full schemas.

Ready-made research templates

The server also ships guided prompt templates your AI assistant can pull in with one click — they walk it through a proven, multi-step process so you don't have to spell out every instruction:

Template	What it guides your AI to do
`comprehensive-research`	Run a structured, multi-step deep dive on a topic
`fact-check`	Verify a claim against multiple independent sources
`competitive-analysis`	Size up a company and its market (news, patents, web)
`literature-review`	Systematically review academic literature on a topic

In most AI apps these show up wherever you pick a prompt or "/" command. The server exposes live status resources (stats://tools, stats://sessions, stats://rate-limits, stats://providers), a lens catalog (lenses://catalog), diagnostics (diagnostics://errors/recent, diagnostics://health), and a large-payload artifact store (research://artifact/{id}) so you — or your AI — can check usage, limits, and which providers are active. See docs/DEPLOYMENT.md for the full list.

Quick Start

Option 1: Homebrew (macOS / Linux — recommended)

brew install zoharbabin/tap/web-researcher-mcp
claude mcp add --scope user web-researcher -- web-researcher-mcp

Homebrew handles trust, updates, and PATH for you — no signing warnings.

Option 2: One-command install (any OS — no dev tools needed)

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.ps1 | iex"

Downloads the binary, verifies its SHA-256 checksum against the signed release, puts it on your PATH, and registers it with Claude Code if installed. Customize the install location:

INSTALL_DIR=/opt/tools curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh

Connect to Your AI Assistant

The install script registers with Claude Code automatically. For other apps, add to your AI's config file:

{
 "mcpServers": {
 "web-researcher": {
 "command": "web-researcher-mcp",
 "env": {
 "GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_GOOGLE_API_KEY",
 "GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ENGINE_ID"
 }
 }
 }
}

Any provider works — pick one and set its key. For example, Brave (no Google keys needed):

{
 "mcpServers": {
 "web-researcher": {
 "command": "web-researcher-mcp",
 "env": {
 "SEARCH_PROVIDER": "brave",
 "BRAVE_API_KEY": "YOUR_BRAVE_API_KEY"
 }
 }
 }
}

Swap in any provider from the Configuration table by setting SEARCH_PROVIDER and that provider's key. Done — your AI assistant now has access to all research tools.

Configuration

No API key required. DuckDuckGo is the built-in zero-config fallback — install and go. To raise result quality and unlock image/news search, add any one of the providers below. They're all optional and interchangeable — pick whichever you already use or prefer; the server treats them equally.

Search providers

Set SEARCH_PROVIDER=<name> and supply that provider's key. Every provider works with search lenses, and any of them can be combined for automatic failover (see Search Providers).

Provider	`SEARCH_PROVIDER`	Key variable(s)	Get a key
DuckDuckGo	`duckduckgo`	none	Built in — zero config
Google PSE	`google`	`GOOGLE_CUSTOM_SEARCH_API_KEY` + `GOOGLE_CUSTOM_SEARCH_ID`	cloud console + engine
Brave	`brave`	`BRAVE_API_KEY`	brave.com/search/api
Serper	`serper`	`SERPER_API_KEY`	serper.dev
SearchAPI.io	`searchapi`	`SEARCHAPI_API_KEY`	searchapi.io
SearXNG	`searxng`	`SEARXNG_URL`	self-hosted
Tavily	`tavily`	`TAVILY_API_KEY`	app.tavily.com
Exa	`exa`	`EXA_API_KEY`	dashboard.exa.ai
Hacker News	`hackernews`	none	Built in — zero config (HN Algolia index)

Each provider has its own free tier, signup flow, and capability mix (images, news, freshness). See docs/API_SETUP.md for step-by-step setup of every provider and a capability comparison. Set up more than one and the server fails over automatically — see Search Providers.

When SEARCH_PROVIDER is unset, the server uses Google if its keys are present and otherwise falls back to the zero-config DuckDuckGo provider — so it always works out of the box, with or without keys.

Academic search providers (OpenAlex, CrossRef) accept a contact email to unlock faster access via the polite pool — no registration, just an email. See docs/API_SETUP.md for setup and docs/DEPLOYMENT.md for the full variable reference.

With these set, academic_search returns real papers with DOIs, authors, citation counts, and open-access PDF links. Without them, it still works but uses web search as a fallback.

Patent Search (Optional)

Patent providers (EPO, USPTO, The Lens) require API keys for structured patent data. See docs/API_SETUP.md for step-by-step setup and docs/DEPLOYMENT.md for the full variable reference.

With these, patent_search returns structured patent data with classification codes, dates, and inventors. Without them, it falls back to web search.

Under the Hood

Search Providers

You choose which search engine powers your research. All of them work with lenses.

Provider	Whole-Web	Images	News	Notes
DuckDuckGo	Yes	—	—	Zero-config default (no API key needed); rate-limited for heavy use
Google PSE	Yes	Yes	Yes	Programmable Search Engine; free tier: 100 queries/day
Brave Search	Yes	Yes	Yes	Independent index; free tier available
Serper.dev	Yes	Yes	Yes	Google-identical results
SearXNG	Yes	Yes	Yes	Self-hosted, privacy-first, air-gapped deployments
SearchAPI.io	Yes	Yes	Yes	Unified API with multiple engine backends
Tavily	Yes	—	Yes	AI-agent search; clean, LLM-ready content
Exa	Yes	—	Yes	Neural/semantic search; also backs `answer` & `structured_search` and the optional paid scrape tier
Hacker News	HN only	—	Yes	Zero-config (HN Algolia index); searches HN threads, not the full web

Multiple Providers (recommended)

Set up multiple search engines so if one has issues, your research doesn't stop:

export SEARCH_ROUTING=brave,google,serper

If Brave is down, it automatically tries Google. If Google is rate-limited, it falls through to Serper. Your research just works.

See docs/DEPLOYMENT.md for advanced routing options (per-topic routing, patent-specific providers, etc.).

Single Provider

If you only have one search API key, that works too — just set it up and go.

Search Lenses

Search lenses let you control which websites your AI is allowed to search. Instead of searching the entire web (and getting blogs, spam, and AI-generated junk), a lens restricts results to only the sources you trust for that topic.

Built-in Lenses

Lens	Focus
`docs`	Official documentation and API references only
`academic`	Preprint servers, repositories, open-access journals
`academic-extended`	Preprint servers, OA aggregators, and repositories beyond core journal indexes
`clinical`	Clinical trials, drug safety, evidence-based medicine
`security`	CVEs, advisories, vulnerability research
`journalism`	Public records, corporate filings, FOIA
`programming`	Code docs, tutorials, Q&A
`devops`	Infrastructure and operations — Kubernetes, Docker, Terraform, cloud, CI/CD
`news`	Current events, journalism
`tech`	Technology industry
`legal`	Law, cases, statutes
`medical`	Health, medicine
`finance`	Markets, filings
`science`	Research, papers
`government`	Policy, regulations

You can also create your own lenses for any field — just list the domains you trust.

How it works

When you (or your AI) use a lens, results come only from the sites in that lens. For example, using the medical lens means your AI searches PubMed, WHO, NIH, and other clinical sources — never health blogs or supplement ads.

Your AI uses lenses automatically when you ask it to. For example: "Search for recent findings on SGLT2 inhibitors using the clinical lens."

Privacy & Security

Your research queries go directly from your machine to the search provider you chose. They never pass through our servers (we don't have servers). The tool runs entirely on your computer.

Setup for Each AI App

Claude Code

Add to your MCP config (~/.claude.json). Set SEARCH_PROVIDER and the matching key for whichever provider you use (see the Configuration table) — this example uses Google:

{
 "mcpServers": {
 "web-researcher": {
 "command": "/path/to/web-researcher-mcp",
 "env": {
 "SEARCH_PROVIDER": "google",
 "GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
 "GOOGLE_CUSTOM_SEARCH_ID": "017..."
 }
 }
 }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
 "mcpServers": {
 "web-researcher": {
 "command": "/path/to/web-researcher-mcp",
 "env": {
 "GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
 "GOOGLE_CUSTOM_SEARCH_ID": "017..."
 }
 }
 }
}

Cursor

Add to .cursor/mcp.json in your project root:

{
 "mcpServers": {
 "web-researcher": {
 "command": "/path/to/web-researcher-mcp",
 "env": {
 "GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
 "GOOGLE_CUSTOM_SEARCH_ID": "017..."
 }
 }
 }
}

HTTP Mode (Teams / Shared Server)

For teams that want one shared instance everyone connects to:

PORT=3000 \
OAUTH_ISSUER_URL=https://auth.example.com \
OAUTH_AUDIENCE=https://api.example.com \
./web-researcher-mcp

Then connect any AI app to http://localhost:3000/mcp/.

Note: Tool behavior is identical across all connection modes (STDIO and HTTP). The only differences are auth (HTTP requires OAuth) and rate limiting (HTTP enforces per-tenant limits; STDIO has only upstream API quotas). See docs/DEPLOYMENT.md for details.

Performance

Searches come back in under a second. Previously-seen results are cached so repeats are instant. Full article extraction works on 95%+ of the web — including sites that try to block bots. Heavy JavaScript sites get a real browser behind the scenes (automatic, no setup needed).

Development

go build -o web-researcher-mcp ./cmd/web-researcher-mcp # Build
go test -race ./... # Test (with race detector)
make verify # Full CI gate (see Makefile for steps)

The lint, gosec, and govulncheck tools are pinned as go.mod tool directives, so make verify runs them at the exact versions CI uses (no global installs needed). Branch protection requires the Lint, Test, Security, and E2E checks to pass.

See CONTRIBUTING.md for the full development workflow, code style guide, and PR process.

Troubleshooting

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for code style guidelines, development workflow, and how to submit pull requests.

Documentation

Document	Description
ARCHITECTURE.md	Design decisions, technology stack, dependencies
CONTRIBUTING.md	Development setup, code style, PR workflow
docs/TOOLS.md	Tool specifications and parameter schemas
docs/EXAMPLES.md	Usage examples with JSON tool calls
docs/API_SETUP.md	Search provider API key setup for all providers
docs/SECURITY.md	Threat model, SSRF, auth, compliance (SOC2/GDPR/FedRAMP)
docs/PRIVACY.md	What data goes where, third-party processors, retention
docs/DEPLOYMENT.md	Build, Docker, Kubernetes, client configs, scaling
docs/PYTHON_CLIENT.md	Python SDK — WebResearcherClient reference, sync wrapper, installation
docs/LESSONS_LEARNED.md	Node.js to Go migration story and lessons
docs/SESSION_PERSISTENCE.md	How sessions survive context loss — design, data flow, citations
docs/MIGRATION.md	Migrating from the deprecated google-researcher-mcp

License

MIT

Star History

👁 Star History Chart

Built with Go and the Model Context Protocol

If you're tired of AI making things up, give this a try — and a ⭐ if it helps.

👁 Image
Directories ¶

Path	Synopsis
cmd
gen-python-client command gen-python-client dumps the full tools/list schema (inputSchema + outputSchema for every registered tool) to stdout as a JSON array.	gen-python-client dumps the full tools/list schema (inputSchema + outputSchema for every registered tool) to stdout as a JSON array.
web-researcher-mcp command
internal
audit
auth
cache
circuit
config
consent Package consent records, verifies, and honors user consent for regulated data processing.	Package consent records, verifies, and honors user consent for regulated data processing.
content
datasubject Package datasubject implements GDPR/Law 25 data-subject rights (access, portability, erasure) over a pluggable registry.	Package datasubject implements GDPR/Law 25 data-subject rights (access, portability, erasure) over a pluggable registry.
documents
memory Package memory implements opt-in, consent-gated, cross-session long-term research memory (#88).	Package memory implements opt-in, consent-gated, cross-session long-term research memory (#88).
metrics
persist Package persist provides a small key/value Store with TTL semantics, backed either by process memory (zero-config default) or by encrypted disk.	Package persist provides a small key/value Store with TTL semantics, backed either by process memory (zero-config default) or by encrypted disk.
ratelimit
redisbackend Package redisbackend is the SINGLE, ISOLATED home for all Redis-backed implementations of the server's storage interfaces.	Package redisbackend is the SINGLE, ISOLATED home for all Redis-backed implementations of the server's storage interfaces.
resources
scraper
search
server
session
tools
useranalytics Package useranalytics implements opt-in, consent-gated, per-user usage analytics (#92).	Package useranalytics implements opt-in, consent-gated, per-user usage analytics (#92).
workspace Package workspace implements opt-in shared research workspaces (#96) — reframed from "the server owns workspaces" to "the server is a tenant-scoped shared NAMESPACE the HOST authorizes access to".	Package workspace implements opt-in shared research workspaces (#96) — reframed from "the server owns workspaces" to "the server is a tenant-scoped shared NAMESPACE the HOST authorizes access to".

URL: https://pkg.go.dev/github.com/zoharbabin/web-researcher-mcp