👁 Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs) avatar

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

👁 Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Under maintenance

Try for free

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

👁 Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

Sitemap URL Discovery

Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.

Why this exists

Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list — but discovering it requires:

Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
Parsing robots.txt for Sitemap: directives
Recursively walking sitemap-index → child sitemaps
Parsing each one for <url> records

This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.

What you get

Summary row

{
"_type":"summary",
"site_url":"https://www.apify.com",
"sitemaps_scanned":5,
"sitemap_urls":[
"https://www.apify.com/sitemap.xml",
"https://www.apify.com/sitemap-index.xml",
"https://www.apify.com/sitemap/actors1.xml",
 ...
],
"urls_discovered":12384
}

Per-URL row

{
"_type":"url",
"url":"https://www.apify.com/store/actors/...",
"lastmod":"2026-06-08",
"changefreq":"weekly",
"priority":"0.7"
}

Quick start

Discover all URLs on a domain

{
"siteUrl":"https://www.apify.com"
}

Only product / actor pages

{
"siteUrl":"https://www.apify.com",
"pathContains":"/store/actors/",
"maxUrls":5000
}

Cap scan size for huge sites

{
"siteUrl":"https://en.wikipedia.org",
"maxUrls":100000,
"maxSitemapFiles":50
}

Pricing

Pay-Per-Event:

$0.01 — flat fee per site (covers initial discovery)
$0.0001 — per URL row returned

Run	URLs	Cost
Small SaaS site	200	$0.03
Mid-sized blog	5,000	$0.51
Mega site	100,000	$10.01

Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.

Use cases

SEO audit — Pull every URL with its lastmod; find stale content
Crawl planning — Feed URLs into Web → Markdown or your own scraper
Content monitoring — Detect new URLs by comparing snapshots over time
Competitor research — See what a competitor's catalog looks like
Sitemap sanity check — Verify sitemap-index works; catch broken nested sitemaps

Limitations

No HTML scraping fallback — If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
Doesn't honor noindex — A URL in sitemap might still be noindex in HTML; this actor returns what's in sitemap.

Related actors (same author)

Web Page → Markdown Converter — Convert discovered URLs to text
HTML Metadata Extractor — Pull meta tags from each URL
PDF Text Extractor
JSON Schema Generator

Feedback

A short review helps SEO engineers find it: Leave a review on Apify Store

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

👁 User avatar

vøiddo

Sitemap URL Extractor

wiry_kingdom/sitemap-url-extractor

Extract every URL from any website's sitemap.xml with lastmod, changefreq, priority. Recursively expands sitemap index files, reads robots.txt, handles gzipped sitemaps. SEO audits, content migration, site inventory, competitor research.

👁 User avatar

Mohieldin Mohamed

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

Sitemap & URL Extractor — Get Every URL of a Website

dataquarry/sitemap-url-extractor

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

👁 User avatar

Daniel Brenner

XML Sitemap Scraper & URL Extractor API - SEO Crawler

pink_comic/sitemap-url-extractor

Extract URLs from XML sitemaps and robots.txt for SEO crawls, audits, content migrations, and RAG indexing. Auto-discovers sitemap files, parses nested sitemap indexes, and exports URL, lastmod, priority, changefreq, and image metadata in bulk.

👁 User avatar

Ava Torres

👁 Sitemap URL Extractor - List All URLs in a Sitemap avatar

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

👁 User avatar

Walid

👁 Sitemap Extractor: Every URL, Recursive, Reliable avatar

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

seemuapps/sitemap-extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

👁 User avatar

Andrew

👁 Sitemap & URL Discovery - Find All URLs on Any Site avatar

Sitemap & URL Discovery - Find All URLs on Any Site

santamaria-automations/sitemap-url-discovery

Discover every URL on any website by parsing sitemap.xml, robots.txt, and sitemap indexes. Extract URLs with last modified dates, change frequency, and priority. Perfect for SEO audits, content analysis, crawling preparation, and site mapping.

👁 User avatar

Ale

👁 Sitemap to URL Crawler — Extract Sitemap.xml URLs for RAG avatar

Sitemap to URL Crawler — Extract Sitemap.xml URLs for RAG

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

👁 User avatar

Logiover

URL: https://apify.com/gochujang/sitemap-url-discovery