Sitemap Sniffer

Pricing

from $0.90 / 1,000 discovered sitemap items

Sitemap Sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

Pricing

from $0.90 / 1,000 discovered sitemap items

Rating

0.0

(0)

Developer

👁 Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

🗺️ Sitemap sniffer for SEO audits

Sitemap Sniffer finds public sitemap files for websites, domains, robots.txt files, direct sitemap URLs, and sitemap indexes. Use this sitemap sniffer when you need a quick SEO sitemap audit, a sitemap finder for multiple sites, or a sitemap URL extractor before a crawl.

Start with a public website such as apify.com, a bare domain such as example.com, or a known sitemap such as https://example.com/sitemap.xml. The Actor checks public sitemap sources, follows sitemap indexes when enabled, and saves clean output rows you can export from Apify or use through the API.

🔎 What this Actor does

Reads public robots.txt files and follows Sitemap: directives.
Checks common sitemap paths for website roots and bare domains.
Accepts direct sitemap, sitemap index, and robots.txt URLs.
Parses XML sitemap indexes, XML URL sets, plain-text sitemaps, and gzipped sitemap responses.
Follows nested sitemap indexes within your depth and output limits.
Saves one sitemap row per discovered sitemap file.
Optionally emits URL inventory rows from sitemap contents.
Adds one target summary row per submitted target, including no-sitemap outcomes.

This Actor is focused on public sitemap discovery. It does not crawl arbitrary internal links, scrape page content, check broken links, submit sitemaps to search engines, or validate whether URLs are indexed.

📦 Data you get

Each run can return three output types.

Sitemap rows describe discovered sitemap files:

sitemap URL, canonical URL, parent sitemap URL, and index depth
target website, normalized origin, and domain host
sitemap type, HTTP status, content type, byte count, and compression flag
URL count, child sitemap count, first lastmod, and discovery source
all discovery sources when the same sitemap is found more than one way

URL inventory rows are optional. When enabled, they include each URL found inside parsed sitemaps, the source sitemap URL, lastmod, changefreq, priority, and hreflang alternates when the sitemap provides them.

Target summary rows make batch runs easier to filter. They report whether each target was completed, skipped, or produced no public sitemap files.

🚀 How to run it

Add one or more website or sitemap targets.
Keep sitemap index following enabled for normal SEO audits.
Leave URL inventory rows off for a fast sitemap-file audit.
Turn on URL inventory rows when you want the URLs listed inside the sitemaps.
Set sitemap and URL row limits to control output size and cost.
Run the Actor and open the dataset overview.

No cookies, login, source API key, or proxy settings are needed from you. The target must expose public sitemap assets over http or https.

⚙️ Input example

{
"targets":[
"https://apify.com",
"example.com",
"https://example.com/sitemap.xml"
],
"followSitemapIndexes":true,
"maxIndexDepth":1,
"parseSitemapDetails":true,
"emitUrlRows":false,
"maxSitemapRows":10,
"maxUrlRows":10000
}

Website or sitemap targets is the only required input. You can paste roots, bare domains, robots.txt URLs, sitemap URLs, or sitemap index URLs in the same list.

Use Follow sitemap indexes and Maximum sitemap index depth to control nested index expansion. Use Parse sitemap details when you want counts, type, size, compression, and URL metadata. Use Emit URL inventory rows only when you want individual URLs from the sitemaps in the dataset.

🧾 Output example

{
"recordType":"sitemap",
"target":"https://apify.com",
"targetIndex":0,
"normalizedOrigin":"https://apify.com",
"domainHost":"apify.com",
"url":"https://apify.com/sitemap.xml",
"canonicalUrl":"https://apify.com/sitemap.xml",
"type":"sitemap_index",
"httpStatus":200,
"contentType":"application/xml",
"byteCount":1240,
"urlCount":0,
"childSitemapCount":8,
"isCompressed":false,
"lastmod":"2026-06-01",
"discoveredVia":"robots.txt",
"discoverySources":["robots.txt"],
"parentSitemapUrl":null,
"depth":0,
"scrapedAt":"2026-06-15T12:00:00.000Z"
}

When URL inventory is enabled, URL rows use recordType: "url" and include url, sitemapUrl, lastmod, changefreq, priority, and hreflang when available.

💳 Pricing

Sitemap Sniffer uses pay-per-event pricing. One charged event is one discovered sitemap item, URL inventory item, or target summary saved by the run.

Keep URL inventory rows off when you only need sitemap-file metadata. Turn them on when you need a larger URL export for crawl planning, migrations, RAG source lists, or SEO checks.

⚠️ Limits and caveats

Sitemap files must be publicly reachable.
Some websites do not publish sitemap files, or publish them only for selected sections.
Very large sitemap indexes can create many child sitemap or URL rows, so use the row limits for predictable output.
Sitemap metadata is only as complete as the source file. Missing lastmod, changefreq, priority, or hreflang values are not guessed.
This Actor reports public sitemap assets. It does not prove that search engines have indexed the URLs.

❓ FAQ

🔐 Do I need login credentials or an API key?

No. This Actor reads public sitemap assets. You do not need to provide cookies, login credentials, a source API key, or proxy settings.

🧭 Can it crawl my whole website?

No. Use this Actor to discover sitemap files and, optionally, the URLs listed inside those sitemap files. For rendered page crawling and link maps, use Website URL Crawler.

🧩 Can I submit more than one website?

Yes. Add multiple targets to the same run. The output keeps target and targetIndex fields so you can filter each website separately.

📄 Why did I get a target summary but no sitemap rows?

That usually means the target did not expose a public sitemap through robots.txt, common sitemap paths, or the direct URL you submitted. The run still completes so you can audit batches without one empty target failing the whole job.

📝 Changelog

0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Robots.txt Generator ↗ - Generate deployable robots.txt files with sitemap directives and crawler rules.
Website URL Crawler ↗ - Crawl rendered website pages and export discovered links with source context.
Webpage Text Extractor ↗ - Extract clean text or Markdown from public webpages after you collect URLs.
Web Images Scraper ↗ - Extract image URLs and optional image files from public webpages.
RSS Feed Reader ↗ - Read public RSS, Atom, RDF, and JSON Feed URLs into clean dataset rows.

Made with ❤️ by Maxime Dupré

👁 Sitemap Sniffer avatar

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

👁 User avatar

Crawler Bros

Sitemap API

vivid_astronaut/sitemap

👁 User avatar

Fabio Suizu

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

👁 User avatar

vøiddo

👁 Sitemap Scraper avatar

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

👁 User avatar

Percival Villalva

268

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

Website Sitemap Extractor

glassventures/website-sitemap-extractor

Extract all URLs from any website's sitemap. Auto-discovers sitemaps from robots.txt, supports sitemap index files and .gz compression. Filter by URL pattern, date range.

👁 User avatar

Glass Ventures

👁 Find Sitemap from url avatar

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

👁 User avatar

ando

210

1.0

👁 Sitemap Extractor: Every URL, Recursive, Reliable avatar

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

👁 User avatar

Tatsuya Mizuno

👁 Sitemap URL Extractor avatar

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

👁 User avatar

One Scales

570

5.0

URL: https://apify.com/maximedupre/sitemap-sniffer

⇱ Sitemap Sniffer for SEO Audits and URL Lists · Apify

Sitemap Sniffer

🗺️ Sitemap sniffer for SEO audits

🔎 What this Actor does

📦 Data you get

🚀 How to run it

⚙️ Input example

🧾 Output example

💳 Pricing

⚠️ Limits and caveats

❓ FAQ

🔐 Do I need login credentials or an API key?

🧭 Can it crawl my whole website?

🧩 Can I submit more than one website?

📄 Why did I get a target summary but no sitemap rows?

📝 Changelog

🆘 Support

🔗 Other actors

You might also like

Sitemap Sniffer

Sitemap API

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Sitemap Scraper

Sitemap Extractor

Website Sitemap Extractor

Find Sitemap from url

Sitemap Extractor: Every URL, Recursive, Reliable

Sitemap Crawler - XML Sitemap URL Extractor

Sitemap URL Extractor