VOOZH about

URL: https://apify.com/wisteria_banjo/sitemap-generator---creates-sitemap-xml-for-any-domain

⇱ Sitemap Generator - Creates sitemap.xml for any domain Β· Apify


πŸ‘ Sitemap Generator - Creates sitemap.xml for any domain avatar

Sitemap Generator - Creates sitemap.xml for any domain

Pricing

from $5.00 / 1,000 results

Go to Apify Store

Sitemap Generator - Creates sitemap.xml for any domain

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: βœ… A ready-to-submit sitemap.xml (Google-compliant) βœ… A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Chris Xavier

Chris Xavier

Maintained by Community

Actor stats

1

Bookmarked

13

Total users

0

Monthly active users

6 months ago

Last modified

Share

πŸ—ΊοΈ Sitemap Generator (Apify Actor)

Generate a clean, standards-compliant sitemap.xml for a website β€” automatically, reliably, and without manual cleanup.

This actor crawls a single website, discovers all indexable pages, and produces:

  • βœ… A ready-to-submit sitemap.xml (Google-compliant)
  • βœ… A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Built for SEO professionals, agencies, and site owners who want accuracy, transparency, and results they can trust.

βœ… What This Actor Does

  • Crawls one website per run (no mixed domains, no confusion)
  • Discovers internal pages by following links
  • Excludes junk/system URLs automatically (e.g. Cloudflare, admin endpoints)
  • Respects robots.txt (optional)
  • Removes duplicate URLs and URL fragments
  • Optionally strips query strings to prevent sitemap bloat
  • Extracts real <lastmod> dates when available:
    • From HTTP Last-Modified headers
    • From blog/article meta tags when headers are missing
  • Outputs a fully valid sitemap.xml

πŸ“¦ Outputs (Where to Find Your Files)

Run β†’ Storage β†’ Key-value store β†’ sitemap.xml

This file is:

  • Ready to upload to Google Search Console
  • Ready to host at /sitemap.xml
  • Standards-compliant (no reconstruction required)

🟒 JSON Results (Dataset)

Every discovered page is also saved to the Dataset.

Each row includes:

  • url – discovered page URL
  • depth – crawl depth from the homepage
  • lastmod – modification date (when available)
  • lastmodSource – "header", "meta", or null

This dataset is useful for:

  • Auditing and QA
  • URL counts and reporting
  • Monetization and billing logic
  • Previewing results before download

πŸ”’ Important Design Decisions (On Purpose)

One Website per Run

This actor enforces a single start URL.

Why?

  • A sitemap must not mix domains
  • One site = one sitemap = one clean result
  • Prevents invalid or rejected sitemaps
  • Enables clear pricing per site

Honest <lastmod> Values

The actor does not fake modification dates.

  • Uses real server headers when available
  • Falls back to article metadata for blog posts
  • Omits <lastmod> when no trustworthy source exists

This avoids misleading search engines and protects SEO integrity.

βš™οΈ Inputs

Required

  • Start URL
    The root URL of the website (example: https://example.com)

Optional

  • Max crawl depth
  • Max number of pages
  • Concurrency
  • Headless browser (for JavaScript-heavy sites)
  • Strip query strings
  • Respect robots.txt
  • Advanced include/exclude URL patterns (regex)

Most users can run the actor with just a Start URL.

🧠 Who This Is For

  • SEO professionals
  • Agencies managing multiple client sites
  • Developers who need clean sitemaps programmatically
  • Site owners preparing for Google Search Console
  • AI-first websites optimizing crawlability

πŸ’‘ Why Use This Actor Instead of Online Sitemap Tools?

  • No URL limits
  • No fake results
  • No mixed domains
  • No guessing which pages were included
  • Full transparency (XML + JSON)
  • Automation-ready and API-friendly

πŸ” PPE (Paid / Private / Enterprise)

This actor is designed for PPE use:

  • Consistent, auditable outputs
  • Dataset always populated (even if XML is downloaded)
  • Clear value per run
  • Suitable for client-facing and internal workflows

Run it. Download sitemap.xml. Submit. Done.

🟒 sitemap.xml (Primary Output)

Your sitemap is written as a real XML file.

Location in Apify UI:

You might also like

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

568

5.0

Sitemap URL Extractor

getdataforu/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

2

5.0

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

πŸ‘ User avatar

Percival Villalva

268

Sitemap to URL Crawler β€” Extract Sitemap.xml URLs for RAG

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Sitemap Generator

gentle_cloud/sitemap-generator

Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.