VOOZH about

URL: https://apify.com/scrapeworks/sitemap-to-urls

⇱ Sitemap URL Extractor - SEO & Site Audit Tool Β· Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Sitemap to URL List Extractor

Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Nicolas van Arkens

Nicolas van Arkens

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

20 days ago

Last modified

Share

Sitemap to URL List Extractor πŸ—ΊοΈ

Extract every URL from any website's sitemap as clean structured JSON, with last-modified dates, priority, and change frequency. Handles sitemap indexes (recursing into nested sitemaps) and gzipped sitemaps (.xml.gz) automatically β€” the parts most lazy sitemap scrapers get wrong.

Perfect for SEO audits, content inventories, building crawl lists, monitoring site changes, and feeding URL lists into other tools.

Why use it

  • πŸ” Recursive sitemap indexes β€” point it at sitemap_index.xml and it'll follow every nested sitemap automatically
  • πŸ“¦ Gzipped sitemaps handled β€” many large sites ship .xml.gz, decompressed transparently
  • πŸ—“οΈ Full metadata β€” lastmod, priority, changefreq, plus image URLs from the Google image sitemap extension
  • πŸ›‘οΈ Guardrails β€” configurable caps on total URLs and nested sitemaps so a giant site can't run away
  • 🌐 Works on any website β€” news sites, e-commerce stores, blogs, documentation, public web apps

Use cases

  • SEO audits β€” list every indexed URL on a site, sort by lastmod, find stale content
  • Crawl seed lists β€” generate URL lists for downstream scrapers or archival tools
  • Content inventories β€” see what a competitor or partner site is actually publishing
  • Change monitoring β€” schedule it and detect newly added pages
  • Site migrations β€” get the full URL set before redirect mapping

Input

FieldDescription
Sitemap URLsOne or more sitemap URLs (sitemap.xml, sitemap_index.xml, or .xml.gz).
Follow indexesIf a URL points to an index, recurse and process each nested sitemap.
Maximum URLsTotal URL cap across all sitemaps.
Maximum nested sitemapsCap on number of sitemap files fetched when following indexes.

Output

{
"url":"https://example.com/page1",
"lastModified":"2025-05-10",
"changeFrequency":"weekly",
"priority":"0.8",
"images":["https://example.com/img1.jpg"],
"sourceSitemap":"https://example.com/sitemap-products.xml",
"sourceRoot":"https://example.com/sitemap_index.xml"
}

Export to JSON, CSV, or Excel, or pull via the Apify API.

Notes

  • Supports the standard sitemaps.org protocol, sitemap index files, and the Google image-sitemap extension.
  • Always respects each site's robots.txt policy on access β€” please use responsibly.
  • Independent tool; sitemaps remain the property of their publishers.

You might also like

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Sitemap URL Extractor

seemuapps/sitemap-extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more β€” ready to export as CSV, JSON, or Excel.

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

2

Sitemap Finder & URL Extractor Β· Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

πŸ‘ User avatar

Corentin Robert

3