Sitemap URL Discovery (sitemap.xml + robots.txt β all URLs)
Under maintenancePricing
Pay per usage
Sitemap URL Discovery (sitemap.xml + robots.txt β all URLs)
Under maintenanceGiven a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Sitemap URL Discovery
Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.
Why this exists
Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list β but discovering it requires:
- Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
- Parsing robots.txt for
Sitemap:directives - Recursively walking sitemap-index β child sitemaps
- Parsing each one for
<url>records
This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.
What you get
Summary row
{"_type":"summary","site_url":"https://www.apify.com","sitemaps_scanned":5,"sitemap_urls":["https://www.apify.com/sitemap.xml","https://www.apify.com/sitemap-index.xml","https://www.apify.com/sitemap/actors1.xml",...],"urls_discovered":12384}
Per-URL row
{"_type":"url","url":"https://www.apify.com/store/actors/...","lastmod":"2026-06-08","changefreq":"weekly","priority":"0.7"}
Quick start
Discover all URLs on a domain
{"siteUrl":"https://www.apify.com"}
Only product / actor pages
{"siteUrl":"https://www.apify.com","pathContains":"/store/actors/","maxUrls":5000}
Cap scan size for huge sites
{"siteUrl":"https://en.wikipedia.org","maxUrls":100000,"maxSitemapFiles":50}
Pricing
Pay-Per-Event:
$0.01β flat fee per site (covers initial discovery)$0.0001β per URL row returned
| Run | URLs | Cost |
|---|---|---|
| Small SaaS site | 200 | $0.03 |
| Mid-sized blog | 5,000 | $0.51 |
| Mega site | 100,000 | $10.01 |
Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.
Use cases
- SEO audit β Pull every URL with its
lastmod; find stale content - Crawl planning β Feed URLs into Web β Markdown or your own scraper
- Content monitoring β Detect new URLs by comparing snapshots over time
- Competitor research β See what a competitor's catalog looks like
- Sitemap sanity check β Verify sitemap-index works; catch broken nested sitemaps
Limitations
- No HTML scraping fallback β If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
- Doesn't honor noindex β A URL in sitemap might still be
noindexin HTML; this actor returns what's in sitemap.
Related actors (same author)
- Web Page β Markdown Converter β Convert discovered URLs to text
- HTML Metadata Extractor β Pull meta tags from each URL
- PDF Text Extractor
- JSON Schema Generator
Feedback
A short review helps SEO engineers find it: Leave a review on Apify Store
