Pricing
from $0.90 / 1,000 discovered sitemap items
Sitemap Sniffer
Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.
Pricing
from $0.90 / 1,000 discovered sitemap items
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
πΊοΈ Sitemap sniffer for SEO audits
Sitemap Sniffer finds public sitemap files for websites, domains, robots.txt files, direct sitemap URLs, and sitemap indexes. Use this sitemap sniffer when you need a quick SEO sitemap audit, a sitemap finder for multiple sites, or a sitemap URL extractor before a crawl.
Start with a public website such as apify.com, a bare domain such as example.com, or a known sitemap such as https://example.com/sitemap.xml. The Actor checks public sitemap sources, follows sitemap indexes when enabled, and saves clean output rows you can export from Apify or use through the API.
π What this Actor does
- Reads public
robots.txtfiles and followsSitemap:directives. - Checks common sitemap paths for website roots and bare domains.
- Accepts direct sitemap, sitemap index, and
robots.txtURLs. - Parses XML sitemap indexes, XML URL sets, plain-text sitemaps, and gzipped sitemap responses.
- Follows nested sitemap indexes within your depth and output limits.
- Saves one sitemap row per discovered sitemap file.
- Optionally emits URL inventory rows from sitemap contents.
- Adds one target summary row per submitted target, including no-sitemap outcomes.
This Actor is focused on public sitemap discovery. It does not crawl arbitrary internal links, scrape page content, check broken links, submit sitemaps to search engines, or validate whether URLs are indexed.
π¦ Data you get
Each run can return three output types.
Sitemap rows describe discovered sitemap files:
- sitemap URL, canonical URL, parent sitemap URL, and index depth
- target website, normalized origin, and domain host
- sitemap type, HTTP status, content type, byte count, and compression flag
- URL count, child sitemap count, first
lastmod, and discovery source - all discovery sources when the same sitemap is found more than one way
URL inventory rows are optional. When enabled, they include each URL found inside parsed sitemaps, the source sitemap URL, lastmod, changefreq, priority, and hreflang alternates when the sitemap provides them.
Target summary rows make batch runs easier to filter. They report whether each target was completed, skipped, or produced no public sitemap files.
π How to run it
- Add one or more website or sitemap targets.
- Keep sitemap index following enabled for normal SEO audits.
- Leave URL inventory rows off for a fast sitemap-file audit.
- Turn on URL inventory rows when you want the URLs listed inside the sitemaps.
- Set sitemap and URL row limits to control output size and cost.
- Run the Actor and open the dataset overview.
No cookies, login, source API key, or proxy settings are needed from you. The target must expose public sitemap assets over http or https.
βοΈ Input example
{"targets":["https://apify.com","example.com","https://example.com/sitemap.xml"],"followSitemapIndexes":true,"maxIndexDepth":1,"parseSitemapDetails":true,"emitUrlRows":false,"maxSitemapRows":10,"maxUrlRows":10000}
Website or sitemap targets is the only required input. You can paste roots, bare domains, robots.txt URLs, sitemap URLs, or sitemap index URLs in the same list.
Use Follow sitemap indexes and Maximum sitemap index depth to control nested index expansion. Use Parse sitemap details when you want counts, type, size, compression, and URL metadata. Use Emit URL inventory rows only when you want individual URLs from the sitemaps in the dataset.
π§Ύ Output example
{"recordType":"sitemap","target":"https://apify.com","targetIndex":0,"normalizedOrigin":"https://apify.com","domainHost":"apify.com","url":"https://apify.com/sitemap.xml","canonicalUrl":"https://apify.com/sitemap.xml","type":"sitemap_index","httpStatus":200,"contentType":"application/xml","byteCount":1240,"urlCount":0,"childSitemapCount":8,"isCompressed":false,"lastmod":"2026-06-01","discoveredVia":"robots.txt","discoverySources":["robots.txt"],"parentSitemapUrl":null,"depth":0,"scrapedAt":"2026-06-15T12:00:00.000Z"}
When URL inventory is enabled, URL rows use recordType: "url" and include url, sitemapUrl, lastmod, changefreq, priority, and hreflang when available.
π³ Pricing
Sitemap Sniffer uses pay-per-event pricing. One charged event is one discovered sitemap item, URL inventory item, or target summary saved by the run.
Keep URL inventory rows off when you only need sitemap-file metadata. Turn them on when you need a larger URL export for crawl planning, migrations, RAG source lists, or SEO checks.
β οΈ Limits and caveats
- Sitemap files must be publicly reachable.
- Some websites do not publish sitemap files, or publish them only for selected sections.
- Very large sitemap indexes can create many child sitemap or URL rows, so use the row limits for predictable output.
- Sitemap metadata is only as complete as the source file. Missing
lastmod,changefreq,priority, orhreflangvalues are not guessed. - This Actor reports public sitemap assets. It does not prove that search engines have indexed the URLs.
β FAQ
π Do I need login credentials or an API key?
No. This Actor reads public sitemap assets. You do not need to provide cookies, login credentials, a source API key, or proxy settings.
π§ Can it crawl my whole website?
No. Use this Actor to discover sitemap files and, optionally, the URLs listed inside those sitemap files. For rendered page crawling and link maps, use Website URL Crawler.
π§© Can I submit more than one website?
Yes. Add multiple targets to the same run. The output keeps target and targetIndex fields so you can filter each website separately.
π Why did I get a target summary but no sitemap rows?
That usually means the target did not expose a public sitemap through robots.txt, common sitemap paths, or the direct URL you submitted. The run still completes so you can audit batches without one empty target failing the whole job.
π Changelog
- 0.1: Initial release.
π Support
For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h π«‘
π Other actors
- Robots.txt Generator β - Generate deployable robots.txt files with sitemap directives and crawler rules.
- Website URL Crawler β - Crawl rendered website pages and export discovered links with source context.
- Webpage Text Extractor β - Extract clean text or Markdown from public webpages after you collect URLs.
- Web Images Scraper β - Extract image URLs and optional image files from public webpages.
- RSS Feed Reader β - Read public RSS, Atom, RDF, and JSON Feed URLs into clean dataset rows.
Made with β€οΈ by Maxime DuprΓ©
