Sitemap & URL Extractor β Get Every URL of a Website
Pricing
Pay per usage
Sitemap & URL Extractor β Get Every URL of a Website
Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
Free. Give it a website (or a sitemap URL) and get back every URL on the site β parsed from sitemap.xml and sitemap-indexes (auto-discovered via robots.txt and the default location), with a same-site crawl fallback when a site has no sitemap. No API key.
Perfect for feeding an LLM/RAG pipeline (find every page to ingest), site audits, migrations, link checking, and SEO.
What you get (per URL)
urlβ the page URL (absolute, deduped)lastmodβ last-modified date from the sitemap, when present (honest-null otherwise)sourceβ"sitemap"or"crawl"(how the URL was found)discoveredAt
How to use it
{"startUrls":["https://example.com"],"maxResults":5000}
Pass a site URL (the sitemap is found automatically) or a direct sitemap URL. It handles sitemap-indexes (sites that split their sitemap into many files) by following each child sitemap, and if there's no sitemap at all it falls back to a polite, same-site crawl. It respects robots.txt, identifies itself, and fetches one request at a time.
Pair it: discover β extract β audit
This is the discover step of a clean "feed-your-AI" toolkit by dataquarry:
- Discover β this actor: every URL of a site.
- Extract β
dataquarry/website-to-markdown: turn those URLs into clean, LLM-ready Markdown. - Audit β
dataquarry/website-seo-metadata-checker: SEO & metadata for each page.
Also see the dataquarry OSM place-data scrapers and free guides at openplacedata.com.
Clean & honest
Reads only public sitemap.xml/robots.txt and (in fallback) public pages; respects robots.txt; sends a descriptive User-Agent; no logins, no PII. Missing values are null, never guessed.
FAQ
Do I need an API key? No β give it a URL and run it. It's free.
What if the site has no sitemap? It crawls the site's own links (same-domain, bounded) so you still get a URL list.
Does it handle huge sitemap-indexes? Yes β it follows child sitemaps up to the maxSitemaps and maxResults caps you set.
