👁 Sitemap URL Extractor — robots.txt + sitemap.xml Crawl avatar

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Pricing

Pay per usage

👁 Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

👁 vøiddo

vøiddo

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

Example output row

{
"domain":"vercel.com",
"url":"https://vercel.com/blog/nextjs-14",
"lastmod":"2024-03-15",
"changefreq":"weekly",
"priority":0.8,
"source":"https://vercel.com/sitemap-blog.xml"
}

How to use

Input

Field	Type	Default	Description
`domains`	`string[]`	`["stripe.com","shopify.com","vercel.com"]`	Domains to crawl — no scheme, no trailing slash
`maxUrlsPerDomain`	`integer`	`2000`	Hard cap on URLs returned per domain
`followSitemapIndex`	`boolean`	`true`	Recursively follow `<sitemapindex>` child links (up to depth 5)

Minimal run

{
"domains":["example.com"],
"maxUrlsPerDomain":500,
"followSitemapIndex":true
}

Output fields

Field	Type	Notes
`domain`	string	Input domain
`url`	string	Discovered URL from `<loc>`
`lastmod`	string	ISO date, `null` if absent
`changefreq`	string	e.g. `weekly`, `null` if absent
`priority`	float	0.0–1.0, `null` if absent
`source`	string	Sitemap file the URL was found in

Pricing

Event	Cost	When charged
`url_extracted`	$0.0001 per URL	Once per run, total = URLs pushed

A 2 000-URL run costs $0.20. Unused budget is not charged — if a domain has only 300 URLs you pay for 300.

Buyer

SEO teams auditing crawl coverage — verify every page is in the sitemap.
Content operations checking lastmod staleness across thousands of URLs.
Competitive intelligence — map a competitor's full URL structure.
QA pipelines validating sitemap health after deploys.
Link-building researchers finding indexable pages at scale.

Source

Crawl order per domain:

GET https://{domain}/robots.txt — parse all Sitemap: lines.
If none found, fall back to GET https://{domain}/sitemap.xml.
For each sitemap URL: fetch + parse XML.
If <sitemapindex>, enqueue each <sitemap><loc> (up to depth 5).
If <urlset>, emit one row per <url> until maxUrlsPerDomain is reached.

All requests use a polite User-Agent and are paced at 250–600 ms between calls. 404 and empty responses are skipped gracefully.

Sitemap URL Extractor

wiry_kingdom/sitemap-url-extractor

Extract every URL from any website's sitemap.xml with lastmod, changefreq, priority. Recursively expands sitemap index files, reads robots.txt, handles gzipped sitemaps. SEO audits, content migration, site inventory, competitor research.

👁 User avatar

Mohieldin Mohamed

👁 Sitemap Extractor: Every URL, Recursive, Reliable avatar

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

👁 Sitemap URL Extractor - List All URLs in a Sitemap avatar

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

👁 User avatar

Walid

👁 Sitemap Sniffer avatar

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

👁 User avatar

Crawler Bros

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

gochujang/sitemap-url-discovery

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

👁 User avatar

Hojun Lee

Sitemap & URL Extractor — Get Every URL of a Website

dataquarry/sitemap-url-extractor

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

👁 User avatar

Daniel Brenner

👁 Sitemap Sniffer avatar

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

👁 User avatar

Maxime Dupré

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.

👁 User avatar

Crawler Bros

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

seemuapps/sitemap-extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

👁 User avatar

Andrew

URL: https://apify.com/v0iddo/sitemap-url-extractor

⇱ Sitemap URL Extractor — All URLs by Domain · Apify

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Example output row

How to use

Pricing

Buyer

Source

You might also like

Sitemap URL Extractor

Sitemap Extractor: Every URL, Recursive, Reliable

Sitemap Extractor

Sitemap URL Extractor - List All URLs in a Sitemap

Sitemap Sniffer

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Sitemap & URL Extractor — Get Every URL of a Website

Sitemap Sniffer

Sitemap URL Extractor

Sitemap URL Extractor