👁 Robots.txt Auditor & Sitemap Finder avatar

Robots.txt Auditor & Sitemap Finder

Pricing

from $1.00 / 1,000 domain auditeds

Try for free

Go to Apify Store

👁 Robots.txt Auditor & Sitemap Finder

Robots.txt Auditor & Sitemap Finder

Try for free

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Pricing

from $1.00 / 1,000 domain auditeds

Rating

0.0

(0)

Developer

👁 Andok

Andok

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Robots.txt Auditor

Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections — this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.

Features

Bulk auditing — process hundreds of domains in a single run with configurable concurrency
Sitemap discovery — extracts all Sitemap: directives declared in each robots.txt
User-agent analysis — identifies every crawler-specific rule block in the file
Status reporting — captures HTTP status codes, file size, and fetch errors
Flexible input — accepts full URLs or bare domains (auto-resolves to /robots.txt)
Error resilience — reports failures per domain without stopping the run
Timestamp tracking — records when each domain was checked for audit trails

Input

Field	Type	Required	Default	Description
`urls`	`array`	Yes	—	List of URLs or domains to audit (e.g. `example.com` or `https://example.com`)
`url`	`string`	No	—	Single URL for backward compatibility. Merged into `urls` if both are provided.
`timeoutSeconds`	`integer`	No	`15`	HTTP timeout in seconds for each robots.txt fetch
`concurrency`	`integer`	No	`10`	Number of domains to process in parallel (1-50)

Input Example

{
"urls":["https://crawlee.dev","https://apify.com","https://example.com"],
"timeoutSeconds":15,
"concurrency":10
}

Output

Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.

inputUrl (string) — the original URL or domain you provided
robotsUrl (string | null) — the resolved robots.txt URL
status (number | null) — HTTP status code (200, 404, etc.)
contentLength (number) — file size in bytes
sitemapCount (number) — number of Sitemap: directives found
sitemaps (string[]) — list of sitemap URLs declared in the file
userAgents (string[]) — list of unique User-agent values
error (string | null) — error message if the fetch failed
checkedAt (string) — ISO timestamp of when the check ran

Output Example

{
"inputUrl":"https://crawlee.dev",
"robotsUrl":"https://crawlee.dev/robots.txt",
"status":200,
"contentLength":342,
"sitemapCount":2,
"sitemaps":[
"https://crawlee.dev/sitemap.xml",
"https://crawlee.dev/sitemap-blog.xml"
],
"userAgents":["*","Googlebot","AhrefsBot"],
"error":null,
"checkedAt":"2025-11-20T14:30:00.000Z"
}

Pricing

Event	Cost
Domain Audited	$0.001 per domain

You are charged per domain audited. Platform usage fees apply separately.

Use Cases

SEO audits — check whether robots.txt accidentally blocks important pages or crawlers
Sitemap discovery — extract all declared sitemap URLs across a portfolio of domains
Competitor intelligence — see which crawlers competitors specifically block or allow
Migration validation — verify robots.txt is correctly configured after a domain migration
Agency reporting — audit robots.txt across all client domains in a single scheduled run

Related Actors

Actor	What it adds
XML Sitemap URL Extractor	Extract all URLs from the sitemaps discovered in robots.txt
Broken Links Checker	Crawl your site to find broken links that robots.txt might be masking
Tech Stack Analyzer	Detect the CMS and frameworks behind the domains you audit

👁 Robots.txt & Sitemap Analyzer avatar

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

👁 User avatar

Stas Persiianenko

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

👁 User avatar

vøiddo

👁 Robots.txt Generator avatar

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

👁 User avatar

Maxime Dupré

Sitemap Robots Delta Monitor

tom_the_builder/sitemap-robots-delta-monitor

Monitor sitemap.xml and robots.txt for URL inventory changes and return new, changed, or removed URLs in normalized JSON.

👁 User avatar

Danil Iarmolchik

👁 Website Metadata Extractor (meta tags, sitemap, robots) 🔎 avatar

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

👁 User avatar

Powerful Bachelor

👁 Sitemap Sniffer avatar

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

👁 User avatar

Crawler Bros

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

👁 User avatar

riya rawat

5.0

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

👁 User avatar

Bikram Adhikari

👁 Sitemap Sniffer avatar

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

👁 User avatar

Maxime Dupré

URL: https://apify.com/andok/robotstxt-auditor