Sitemap URL Extractor

Pricing

from $0.30 / 1,000 url extracteds

Sitemap URL Extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

Pricing

from $0.30 / 1,000 url extracteds

Rating

0.0

(0)

Developer

👁 Andrew

Andrew

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

What you get

URL (loc) for every page listed in the site's sitemap
Last modified date (lastmod) — when each page was last updated
Change frequency (changefreq) — always, hourly, daily, weekly, monthly, yearly, never
Priority (priority) — relative importance of each URL (0.0 - 1.0)
Source sitemap — which sitemap file the URL came from (useful when a site splits its sitemap by section)
Auto-discovery — point at a homepage and the actor finds the sitemap via robots.txt or /sitemap.xml
Gzipped sitemap support — handles .xml.gz files transparently
Recursive sitemap index walking — follows nested <sitemapindex> files up to 5 levels deep

Use cases

SEO audits — pull a full URL inventory before running site-wide checks (broken links, missing meta tags, schema validation)
Content migration — build a complete URL list when moving a site between platforms
Crawl budget planning — see how many URLs a site exposes and how recently each was updated
Competitor research — map out every page a competitor publishes
Sitemap validation — verify that your published sitemap actually contains the pages you expect
Bulk URL scraping pipelines — feed the output into another actor for screenshots, content extraction, or AI summarization

How to use

Enter a Website or Sitemap URL — either a homepage like https://www.example.com (the actor auto-discovers the sitemap) or a direct sitemap URL like https://www.example.com/sitemap.xml
Set Max Items — 0 returns every URL in the entire sitemap tree
Choose whether to Follow Sitemap Index — on by default, so a single run pulls every URL from every child sitemap
Run the actor — results land in the Dataset tab
Export to JSON, CSV, Excel, or Google Sheets directly from the Apify console

Extract every URL on a site

{
"websiteUrl":"https://www.apify.com",
"maxItems":0,
"followSitemapIndex":true
}

Extract only the top-level sitemap

{
"websiteUrl":"https://www.apify.com/sitemap.xml",
"maxItems":0,
"followSitemapIndex":false
}

Output format

One dataset record per URL:

{
"loc":"https://www.apify.com/store",
"lastmod":"2024-08-12",
"changefreq":"daily",
"priority":"0.8",
"sourceSitemap":"https://www.apify.com/sitemap.xml"
}

Fields not present in the sitemap entry come back as null.

Parameters

Field	Default	Description
Website or Sitemap URL	`https://www.apify.com`	Homepage URL (auto-discovered) or direct `.xml` / `.xml.gz` sitemap URL
Max Items	`0`	Maximum URLs to return per run. `0` = unlimited
Follow Sitemap Index	`true`	Recurse into child sitemaps when the top-level file is a sitemap index

Notes

Sitemap discovery first looks for Sitemap: directives in /robots.txt, then falls back to /sitemap.xml
Nested sitemap indexes are walked breadth-first; the actor de-duplicates sitemap URLs so circular references are safe
Recursion is capped at 5 levels deep and 1,000 total sitemaps as a safety net against runaway loops
Each fetched sitemap has a 30-second timeout — slow or unreachable child sitemaps are logged and skipped, the run continues
Gzip-compressed sitemaps (*.xml.gz) are decompressed automatically

Related website & SEO actors

Part of a complete website & SEO toolkit — explore the rest of the suite:

Website Contact Scraper — Emails, phones, and socials from any website
Website Email Scraper — Crawl a site deep and extract all emails
Website Tech Stack Detector — Detect CMS, frameworks, analytics, and DNS/MX
SEO Meta Tag Auditor — Audit title, OG, Twitter cards, and schema
Domain WHOIS & SSL Inspector — WHOIS, domain age, and live SSL details

👁 Sitemap URL Extractor - List All URLs in a Sitemap avatar

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

👁 User avatar

Walid

Sitemap to URL List Extractor

scrapeworks/sitemap-to-urls

Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.

👁 User avatar

Nicolas van Arkens

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.

👁 User avatar

mikolabs

Sitemap & URL Extractor — Get Every URL of a Website

dataquarry/sitemap-url-extractor

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

👁 User avatar

Daniel Brenner

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

👁 User avatar

vøiddo

Sitemap URL Extractor

wiry_kingdom/sitemap-url-extractor

Extract every URL from any website's sitemap.xml with lastmod, changefreq, priority. Recursively expands sitemap index files, reads robots.txt, handles gzipped sitemaps. SEO audits, content migration, site inventory, competitor research.

👁 User avatar

Mohieldin Mohamed

XML Sitemap Scraper & URL Extractor API - SEO Crawler

pink_comic/sitemap-url-extractor

Extract URLs from XML sitemaps and robots.txt for SEO crawls, audits, content migrations, and RAG indexing. Auto-discovers sitemap files, parses nested sitemap indexes, and exports URL, lastmod, priority, changefreq, and image metadata in bulk.

👁 User avatar

Ava Torres

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.

👁 User avatar

Crawler Bros

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

gochujang/sitemap-url-discovery

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

👁 User avatar

Hojun Lee

👁 XML Sitemap URL Extractor avatar

XML Sitemap URL Extractor

andok/sitemap-extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

👁 User avatar

Andok

URL: https://apify.com/seemuapps/sitemap-extractor

⇱ Sitemap URL Extractor · Apify

Sitemap URL Extractor

What you get

Use cases

How to use

Extract every URL on a site

Extract only the top-level sitemap

Output format

Parameters

Notes

Related website & SEO actors

You might also like

Sitemap URL Extractor - List All URLs in a Sitemap

Sitemap to URL List Extractor

Sitemap URL Extractor

Sitemap & URL Extractor — Get Every URL of a Website

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Sitemap URL Extractor

XML Sitemap Scraper & URL Extractor API - SEO Crawler

Sitemap URL Extractor

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

XML Sitemap URL Extractor