VOOZH about

URL: https://apify.com/gochujang/sitemap-url-discovery

⇱ Sitemap URL Discovery (sitemap.xml + robots.txt β†’ all URLs) Β· Apify


πŸ‘ Sitemap URL Discovery (sitemap.xml + robots.txt β†’ all URLs) avatar

Sitemap URL Discovery (sitemap.xml + robots.txt β†’ all URLs)

Under maintenance

Pricing

Pay per usage

Go to Apify Store

Sitemap URL Discovery (sitemap.xml + robots.txt β†’ all URLs)

Under maintenance

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Sitemap URL Discovery

Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.


Why this exists

Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list β€” but discovering it requires:

  1. Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
  2. Parsing robots.txt for Sitemap: directives
  3. Recursively walking sitemap-index β†’ child sitemaps
  4. Parsing each one for <url> records

This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.


What you get

Summary row

{
"_type":"summary",
"site_url":"https://www.apify.com",
"sitemaps_scanned":5,
"sitemap_urls":[
"https://www.apify.com/sitemap.xml",
"https://www.apify.com/sitemap-index.xml",
"https://www.apify.com/sitemap/actors1.xml",
...
],
"urls_discovered":12384
}

Per-URL row

{
"_type":"url",
"url":"https://www.apify.com/store/actors/...",
"lastmod":"2026-06-08",
"changefreq":"weekly",
"priority":"0.7"
}

Quick start

Discover all URLs on a domain

{
"siteUrl":"https://www.apify.com"
}

Only product / actor pages

{
"siteUrl":"https://www.apify.com",
"pathContains":"/store/actors/",
"maxUrls":5000
}

Cap scan size for huge sites

{
"siteUrl":"https://en.wikipedia.org",
"maxUrls":100000,
"maxSitemapFiles":50
}

Pricing

Pay-Per-Event:

  • $0.01 β€” flat fee per site (covers initial discovery)
  • $0.0001 β€” per URL row returned
RunURLsCost
Small SaaS site200$0.03
Mid-sized blog5,000$0.51
Mega site100,000$10.01

Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.


Use cases

  1. SEO audit β€” Pull every URL with its lastmod; find stale content
  2. Crawl planning β€” Feed URLs into Web β†’ Markdown or your own scraper
  3. Content monitoring β€” Detect new URLs by comparing snapshots over time
  4. Competitor research β€” See what a competitor's catalog looks like
  5. Sitemap sanity check β€” Verify sitemap-index works; catch broken nested sitemaps

Limitations

  • No HTML scraping fallback β€” If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
  • Doesn't honor noindex β€” A URL in sitemap might still be noindex in HTML; this actor returns what's in sitemap.

Related actors (same author)


Feedback

A short review helps SEO engineers find it: Leave a review on Apify Store

You might also like

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

2

Sitemap URL Extractor

seemuapps/sitemap-extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

Sitemap & URL Discovery - Find All URLs on Any Site

santamaria-automations/sitemap-url-discovery

Discover every URL on any website by parsing sitemap.xml, robots.txt, and sitemap indexes. Extract URLs with last modified dates, change frequency, and priority. Perfect for SEO audits, content analysis, crawling preparation, and site mapping.

Sitemap to URL Crawler β€” Extract Sitemap.xml URLs for RAG

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.