VOOZH about

URL: https://apify.com/dataquarry/sitemap-url-extractor

⇱ Sitemap & URL Extractor β€” Get Every URL of a Website Β· Apify


πŸ‘ Sitemap & URL Extractor β€” Get Every URL of a Website avatar

Sitemap & URL Extractor β€” Get Every URL of a Website

Pricing

Pay per usage

Go to Apify Store

Sitemap & URL Extractor β€” Get Every URL of a Website

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Daniel Brenner

Daniel Brenner

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 days ago

Last modified

Share

Free. Give it a website (or a sitemap URL) and get back every URL on the site β€” parsed from sitemap.xml and sitemap-indexes (auto-discovered via robots.txt and the default location), with a same-site crawl fallback when a site has no sitemap. No API key.

Perfect for feeding an LLM/RAG pipeline (find every page to ingest), site audits, migrations, link checking, and SEO.

What you get (per URL)

  • url β€” the page URL (absolute, deduped)
  • lastmod β€” last-modified date from the sitemap, when present (honest-null otherwise)
  • source β€” "sitemap" or "crawl" (how the URL was found)
  • discoveredAt

How to use it

{"startUrls":["https://example.com"],"maxResults":5000}

Pass a site URL (the sitemap is found automatically) or a direct sitemap URL. It handles sitemap-indexes (sites that split their sitemap into many files) by following each child sitemap, and if there's no sitemap at all it falls back to a polite, same-site crawl. It respects robots.txt, identifies itself, and fetches one request at a time.

Pair it: discover β†’ extract β†’ audit

This is the discover step of a clean "feed-your-AI" toolkit by dataquarry:

  1. Discover β€” this actor: every URL of a site.
  2. Extract β€” dataquarry/website-to-markdown: turn those URLs into clean, LLM-ready Markdown.
  3. Audit β€” dataquarry/website-seo-metadata-checker: SEO & metadata for each page.

Also see the dataquarry OSM place-data scrapers and free guides at openplacedata.com.

Clean & honest

Reads only public sitemap.xml/robots.txt and (in fallback) public pages; respects robots.txt; sends a descriptive User-Agent; no logins, no PII. Missing values are null, never guessed.

FAQ

Do I need an API key? No β€” give it a URL and run it. It's free.

What if the site has no sitemap? It crawls the site's own links (same-domain, bounded) so you still get a URL list.

Does it handle huge sitemap-indexes? Yes β€” it follows child sitemaps up to the maxSitemaps and maxResults caps you set.

You might also like

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Sitemap URL Extractor

seemuapps/sitemap-extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

Sitemap Finder & URL Extractor Β· Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

πŸ‘ User avatar

Corentin Robert

3

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more β€” ready to export as CSV, JSON, or Excel.

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.