VOOZH about

URL: https://apify.com/andok/find-sitemap-from-url

⇱ XML Sitemap Finder & Extractor API Β· Apify


πŸ‘ XML Sitemap Finder & Extractor API avatar

XML Sitemap Finder & Extractor API

Pricing

from $1.00 / 1,000 sitemap lookups

Go to Apify Store

XML Sitemap Finder & Extractor API

Find and extract all XML sitemaps for any domain. Automatically parses robots.txt, scans HTML tags, and recursively follows indexes. Perfect for SEO & web scraping.

Pricing

from $1.00 / 1,000 sitemap lookups

Rating

0.0

(0)

Developer

πŸ‘ Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

1

Monthly active users

3 months ago

Last modified

Share

Sitemap Finder

Discover all XML sitemaps for any website. Provide one or more URLs and the actor will systematically locate every sitemap by checking common file paths, parsing robots.txt, and scanning HTML content for sitemap references.

Features

  • Multi-source discovery β€” checks 15+ common sitemap paths, robots.txt directives, and HTML <a> / <link> tags
  • Batch processing β€” process multiple websites in a single run with configurable concurrency
  • Recursive index traversal β€” follows sitemap index files to discover all nested child sitemaps
  • Gzip support β€” handles .xml.gz compressed sitemaps automatically
  • XML validation β€” verifies sitemaps contain valid XML and classifies them as index or urlset
  • Rich metadata β€” reports URL count per sitemap, last modified date, discovery source, and validation status
  • Pay-per-event pricing β€” only pay for what you use at $0.001 per URL lookup

Input

FieldTypeDefaultDescription
urlsstring[]β€”Website URLs to check (e.g., ["https://example.com"])
urlstringβ€”Single URL for backward compatibility. Merged into urls if both are set.
findAllbooleantrueFind all sitemaps or stop after the first one
followIndexesbooleantrueRecursively follow sitemap index files to discover child sitemaps
verifybooleantrueVerify sitemaps are valid XML and extract metadata
timeoutinteger10HTTP request timeout in seconds
concurrencyinteger3Max concurrent website processing (1–20)

Example Input

{
"urls":["https://example.com","https://crawlee.dev"],
"findAll":true,
"followIndexes":true,
"verify":true,
"timeout":10,
"concurrency":3
}

Output

Results are stored in the default dataset. Each record represents a discovered sitemap:

{
"websiteUrl":"https://crawlee.dev",
"sitemapUrl":"https://crawlee.dev/sitemap.xml",
"type":"index",
"urlCount":4,
"lastModified":"2024-12-15T10:30:00Z",
"isValid":true,
"source":"common-location"
}
FieldDescription
websiteUrlThe input website URL
sitemapUrlFull URL of the discovered sitemap
typeSitemap type: index (contains other sitemaps), urlset (contains page URLs), or unknown
urlCountNumber of entries in the sitemap (child sitemaps for indexes, page URLs for urlsets)
lastModifiedMost recent <lastmod> date found in the sitemap
isValidWhether the sitemap contains valid XML
sourceHow the sitemap was discovered: common-location, robots.txt, html-content, or index:<parent-url>
errorError message if the lookup failed (only present on error records)

When no sitemaps are found for a URL, a single record is returned with sitemapUrl: null and an appropriate error message.

API Usage

Call the actor via the API and retrieve results from the default dataset:

curl"https://api.apify.com/v2/acts/YOUR_USERNAME~find-sitemap-from-url/run-sync-get-dataset-items?token=YOUR_TOKEN"\
-X POST \
-H"Content-Type: application/json"\
-d'{"urls": ["https://example.com"]}'

Pricing

This actor uses pay-per-event (PPE) pricing:

EventCost
sitemap-lookup$0.001 per URL processed

You are charged once per input URL, regardless of how many sitemaps are discovered for that URL. There are no additional platform fees beyond the per-event charge.

Use Cases

  • SEO auditing β€” verify sitemap coverage and freshness across your sites
  • Web scraping β€” discover all available sitemaps before crawling to plan efficient scraping
  • Site monitoring β€” track sitemap changes, URL counts, and last modified dates over time
  • Competitor analysis β€” map out a competitor's site structure via their sitemaps
  • Migration validation β€” confirm sitemaps are correctly set up after a site migration
  • Content indexing β€” find all content endpoints for search engine optimization

Discovery Methods

The actor uses three complementary discovery strategies:

  1. Common paths β€” checks 15+ well-known sitemap file locations (/sitemap.xml, /wp-sitemap.xml, /sitemap_index.xml, etc.)
  2. robots.txt β€” parses Sitemap: directives from the site's robots.txt file
  3. HTML scanning β€” searches the homepage HTML for <a> and <link> tags referencing sitemaps

When followIndexes is enabled, any discovered sitemap index is recursively expanded to reveal all child sitemaps.

You might also like

Sitemap URL Extractor

automation-lab/sitemap-url-extractor

This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry...

πŸ‘ User avatar

Stas Persiianenko

15

Sitemap Finder & URL Extractor Β· Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

πŸ‘ User avatar

Corentin Robert

3

XML Sitemap URL Extractor

andok/sitemap-extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.