VOOZH about

URL: https://apify.com/maximedupre/sitemap-sniffer

⇱ Sitemap Sniffer for SEO Audits and URL Lists Β· Apify


Pricing

from $0.90 / 1,000 discovered sitemap items

Go to Apify Store

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

Pricing

from $0.90 / 1,000 discovered sitemap items

Rating

0.0

(0)

Developer

πŸ‘ Maxime DuprΓ©

Maxime DuprΓ©

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

πŸ—ΊοΈ Sitemap sniffer for SEO audits

Sitemap Sniffer finds public sitemap files for websites, domains, robots.txt files, direct sitemap URLs, and sitemap indexes. Use this sitemap sniffer when you need a quick SEO sitemap audit, a sitemap finder for multiple sites, or a sitemap URL extractor before a crawl.

Start with a public website such as apify.com, a bare domain such as example.com, or a known sitemap such as https://example.com/sitemap.xml. The Actor checks public sitemap sources, follows sitemap indexes when enabled, and saves clean output rows you can export from Apify or use through the API.

πŸ”Ž What this Actor does

  • Reads public robots.txt files and follows Sitemap: directives.
  • Checks common sitemap paths for website roots and bare domains.
  • Accepts direct sitemap, sitemap index, and robots.txt URLs.
  • Parses XML sitemap indexes, XML URL sets, plain-text sitemaps, and gzipped sitemap responses.
  • Follows nested sitemap indexes within your depth and output limits.
  • Saves one sitemap row per discovered sitemap file.
  • Optionally emits URL inventory rows from sitemap contents.
  • Adds one target summary row per submitted target, including no-sitemap outcomes.

This Actor is focused on public sitemap discovery. It does not crawl arbitrary internal links, scrape page content, check broken links, submit sitemaps to search engines, or validate whether URLs are indexed.

πŸ“¦ Data you get

Each run can return three output types.

Sitemap rows describe discovered sitemap files:

  • sitemap URL, canonical URL, parent sitemap URL, and index depth
  • target website, normalized origin, and domain host
  • sitemap type, HTTP status, content type, byte count, and compression flag
  • URL count, child sitemap count, first lastmod, and discovery source
  • all discovery sources when the same sitemap is found more than one way

URL inventory rows are optional. When enabled, they include each URL found inside parsed sitemaps, the source sitemap URL, lastmod, changefreq, priority, and hreflang alternates when the sitemap provides them.

Target summary rows make batch runs easier to filter. They report whether each target was completed, skipped, or produced no public sitemap files.

πŸš€ How to run it

  1. Add one or more website or sitemap targets.
  2. Keep sitemap index following enabled for normal SEO audits.
  3. Leave URL inventory rows off for a fast sitemap-file audit.
  4. Turn on URL inventory rows when you want the URLs listed inside the sitemaps.
  5. Set sitemap and URL row limits to control output size and cost.
  6. Run the Actor and open the dataset overview.

No cookies, login, source API key, or proxy settings are needed from you. The target must expose public sitemap assets over http or https.

βš™οΈ Input example

{
"targets":[
"https://apify.com",
"example.com",
"https://example.com/sitemap.xml"
],
"followSitemapIndexes":true,
"maxIndexDepth":1,
"parseSitemapDetails":true,
"emitUrlRows":false,
"maxSitemapRows":10,
"maxUrlRows":10000
}

Website or sitemap targets is the only required input. You can paste roots, bare domains, robots.txt URLs, sitemap URLs, or sitemap index URLs in the same list.

Use Follow sitemap indexes and Maximum sitemap index depth to control nested index expansion. Use Parse sitemap details when you want counts, type, size, compression, and URL metadata. Use Emit URL inventory rows only when you want individual URLs from the sitemaps in the dataset.

🧾 Output example

{
"recordType":"sitemap",
"target":"https://apify.com",
"targetIndex":0,
"normalizedOrigin":"https://apify.com",
"domainHost":"apify.com",
"url":"https://apify.com/sitemap.xml",
"canonicalUrl":"https://apify.com/sitemap.xml",
"type":"sitemap_index",
"httpStatus":200,
"contentType":"application/xml",
"byteCount":1240,
"urlCount":0,
"childSitemapCount":8,
"isCompressed":false,
"lastmod":"2026-06-01",
"discoveredVia":"robots.txt",
"discoverySources":["robots.txt"],
"parentSitemapUrl":null,
"depth":0,
"scrapedAt":"2026-06-15T12:00:00.000Z"
}

When URL inventory is enabled, URL rows use recordType: "url" and include url, sitemapUrl, lastmod, changefreq, priority, and hreflang when available.

πŸ’³ Pricing

Sitemap Sniffer uses pay-per-event pricing. One charged event is one discovered sitemap item, URL inventory item, or target summary saved by the run.

Keep URL inventory rows off when you only need sitemap-file metadata. Turn them on when you need a larger URL export for crawl planning, migrations, RAG source lists, or SEO checks.

⚠️ Limits and caveats

  • Sitemap files must be publicly reachable.
  • Some websites do not publish sitemap files, or publish them only for selected sections.
  • Very large sitemap indexes can create many child sitemap or URL rows, so use the row limits for predictable output.
  • Sitemap metadata is only as complete as the source file. Missing lastmod, changefreq, priority, or hreflang values are not guessed.
  • This Actor reports public sitemap assets. It does not prove that search engines have indexed the URLs.

❓ FAQ

πŸ” Do I need login credentials or an API key?

No. This Actor reads public sitemap assets. You do not need to provide cookies, login credentials, a source API key, or proxy settings.

🧭 Can it crawl my whole website?

No. Use this Actor to discover sitemap files and, optionally, the URLs listed inside those sitemap files. For rendered page crawling and link maps, use Website URL Crawler.

🧩 Can I submit more than one website?

Yes. Add multiple targets to the same run. The output keeps target and targetIndex fields so you can filter each website separately.

πŸ“„ Why did I get a target summary but no sitemap rows?

That usually means the target did not expose a public sitemap through robots.txt, common sitemap paths, or the direct URL you submitted. The run still completes so you can audit batches without one empty target failing the whole job.

πŸ“ Changelog

  • 0.1: Initial release.

πŸ†˜ Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫑

πŸ”— Other actors

Made with ❀️ by Maxime Dupré

You might also like

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

πŸ‘ User avatar

Percival Villalva

268

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

Sitemap Extractor: Every URL, Recursive, Reliable

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

2

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

570

5.0