Voozh

Search for:

Web scraping tools and techniques

View all MCP Servers

Why this server?
This server is an excellent fit as its primary function is to 'scrape and extract data from any website' globally, specifically mentioning bypassing anti-bot systems and rendering JavaScript content, which directly addresses the user's need for web scraping (网页爬取).
Thordata MCP Server
xja1023789-collab
-
license
-
quality
-
maintenance
Enables AI models to scrape and extract data from any website globally using Thordata's 195+ country proxy network. Bypasses anti-bot systems and renders JavaScript content, outputting structured data in Markdown, HTML, or Links format.
Last updated 2025-09-23
Why this server?
This tool explicitly enables 'scraping and extraction' of data from websites, covering single-page scraping and multi-page crawling with rendering capabilities, making it a strong match for web scraping needs.
AnyCrawl MCP Server
any4ai
A
license
-
quality
C
maintenance
Enables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.
Last updated 2026-03-19
11
6
MIT
Why this server?
This server focuses on 'browser automation and web content extraction' using Playwright, a core technology for performing reliable web scraping tasks.
Low Cost Browsing MCP Server
lcbro
F
license
-
quality
D
maintenance
Enables browser automation, web content extraction, and LLM-powered data transformation using Playwright. Supports session management, authentication flows, and works with local LLMs (Ollama, JAN AI) or external providers to clean and structure extracted web data.
Last updated 2025-09-14
55
6
Why this server?
This server uses 'Tavily's Search and Crawl APIs to gather and structure data,' which aligns directly with the goal of web crawling and extracting information (网页爬取).
Deep Research MCP
ali-kh7
-
license
B
quality
-
maintenance
A Model Context Protocol compliant server that facilitates comprehensive web research by utilizing Tavily's Search and Crawl APIs to gather and structure data for high-quality markdown document creation.
Last updated 2025-12-16
1
57
12
Why this server?
A production-ready server that provides AI-powered 'web scraping capabilities,' transforming webpages to markdown and extracting structured data, which is highly relevant to the search query.
👁 ScrapeGraph MCP Server by ScrapeGraphAI
ScrapeGraph MCP Serverofficial
ScrapeGraphAI
A
license
A
quality
B
maintenance
A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
Last updated 2026-05-04
8
82
MIT
Why this server?
This server specializes in extracting and transforming 'webpage content into clean, LLM-optimized Markdown,' a crucial step in preparing scraped data for analysis.
Mozilla Readability Parser MCP
emzimmer
A
license
A
quality
D
maintenance
Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
Last updated 2025-01-28
1
36
17
MIT
Why this server?
Enables 'reverse engineering of web applications' and interactions through browser automation, which are advanced techniques used for deep web data harvesting.
WebScout MCP
pyscout
A
license
A
quality
D
maintenance
Enables reverse engineering of web applications and chat interfaces through browser automation, network traffic capture, and streaming API discovery. Provides comprehensive tools for analyzing network patterns, capturing streaming responses, and automating complex web interactions.
Last updated 2025-10-02
14
2
1
ISC
Why this server?
This server enables LLMs to perform 'browser automation and web page interactions' using Playwright, a tool frequently used for web scraping and data extraction from dynamic sites.
Playwright MCP
mattreya
A
license
-
quality
D
maintenance
Enables LLMs to perform browser automation and web page interactions using Playwright's accessibility tree instead of screenshots. Provides fast, deterministic web automation through structured data without requiring vision models.
Last updated 2025-09-22
5,659,017
Apache 2.0
Why this server?
A versatile tool for generalized 'fetching content from URLs' (HTML, JSON, text), providing the basic necessary functionality for web data retrieval.
URL Fetch MCP
aelaguiz
A
license
A
quality
D
maintenance
A Model Context Protocol (MCP) server that enables Claude or other LLMs to fetch content from URLs, supporting HTML, JSON, text, and images with configurable request parameters.
Last updated 2025-03-19
3
3
MIT

URL: https://glama.ai/mcp/servers/search/web-scraping-tools-and-techniques

⇱ Web scraping tools and techniques | Glama

Web scraping tools and techniques

Thordata MCP Server

AnyCrawl MCP Server

Low Cost Browsing MCP Server

Deep Research MCP

ScrapeGraph MCP Serverofficial

Mozilla Readability Parser MCP

WebScout MCP

Playwright MCP

URL Fetch MCP