Sitemap Extractor

Pricing

from $0.10 / 1,000 results

Sitemap Extractor

This Apify Actor extracts all URLs from a website's sitemaps and checks their status codes via lightweight HTTP requests. It provides a clean list of valid links, acting as an ideal pre-processor to ensure your larger crawling projects target only active URLs.

Pricing

from $0.10 / 1,000 results

Rating

3.1

(5)

Developer

👁 Apify

Apify

Maintained by Apify

Actor stats

Bookmarked

171

Total users

Monthly active users

3 months ago

Last modified

Features

Recursive Sitemap Discovery: Automatically detects and traverses nested sitemaps (sitemap indexes).
Efficiency: Uses HTTP HEAD requests for URL validation, which are significantly faster and consume less bandwidth than full GET requests.
Proxy Support: Integrated with Apify Proxy to prevent rate limiting or blocking during the discovery phase.
Detailed Output: Provides the final URL and the corresponding HTTP status code.

How it Works

Input: You provide one or more "Start URLs" pointing to the domain name root, sitemaps or sitemap indexes.
Extraction: The Actor parses the XML, extracting both page URLs and links to further sitemaps.
Validation: For every page URL found, the Actor performs a status check.
Deduplication: The crawler uses unique keys to ensure that even if a URL appears in multiple sitemaps, it is only checked once.

Usage

This Actor is ideal for:

Pre-crawling filter: Generating a "clean" list of URLs for actors like Website Content Crawler or Web Scraper.
SEO Audits: Quickly identifying 404 Not Found or 500 Server Error pages listed in your sitemap.
Site Mapping: Getting a high-level overview of a site's architecture.

Configuration

Field	Description
Start URLs	Just a domain name or a list of sitemap XML URLs to start from.
Proxy configuration	Settings for Apify Proxies.

Website Sitemap Extractor

glassventures/website-sitemap-extractor

Extract all URLs from any website's sitemap. Auto-discovers sitemaps from robots.txt, supports sitemap index files and .gz compression. Filter by URL pattern, date range.

👁 User avatar

Glass Ventures

👁 Find Sitemap from url avatar

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

👁 User avatar

ando

210

1.0

👁 URL to markdown avatar

URL to markdown

apify/url-to-markdown

An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.

👁 User avatar

Apify

👁 Website Image Scraper avatar

Website Image Scraper

gomorrhadev/website-image-scraper

Website Image Scraper is a fast, lightweight tool that crawls websites to extract image URLs (jpg, png, svg) without downloading files or using browsers. It supports recursive crawling, respects robots.txt, and efficiently collects image links for analysis or monitoring or a later download.

👁 User avatar

Gomorrha UG (haftungsbeschränkt)

308

5.0

👁 Website Content to Markdown for LLM Training avatar

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

👁 User avatar

EasyApi

319

5.0

👁 TrustMRR Startup scraper avatar

TrustMRR Startup scraper

advantageous_subcontra/trustmrr

Get all startups listed in any category on TrustMRR startup database. Get all information about each startup, like revenue, founding year, and location.

👁 User avatar

Fabian Maume

👁 Website Image Downloader Pro avatar

Website Image Downloader Pro

powerful_bachelor/website-image-downloader-pro

📸 Website Image Downloader Pro: Extract and download images from any URL! 🚀 Features include image URL extraction, SVG to PNG conversion, downloading, and zipping images. Perfect for market research, AI training, and creating visual archives. 🌐✨ Try it now on Apify! 💾

👁 User avatar

Powerful Bachelor

509

2.5

👁 AI Web Scraper avatar

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

👁 User avatar

Apify

7.6K

4.3

👁 Image Scraper avatar

Image Scraper

rapidtech1898/image-scraper

Extract image links from any website quickly and easily. Enter a URL and the scraper collects all available image URLs in seconds. Perfect for designers, marketers, and developers who need fast access to image sources without manual searching.

👁 User avatar

Max Pohler

103

1.0

👁 Web Images Scraper avatar

Web Images Scraper

jupri/web-images-scraper

Scrape Images from a Webpage

👁 User avatar

cat

592

URL: https://apify.com/apify/sitemap-extractor