VOOZH about

URL: https://apify.com/andok/robotstxt-auditor

⇱ Robots.txt Auditor & Sitemap Finder Β· Apify


πŸ‘ Robots.txt Auditor & Sitemap Finder avatar

Robots.txt Auditor & Sitemap Finder

Pricing

from $1.00 / 1,000 domain auditeds

Go to Apify Store

Robots.txt Auditor & Sitemap Finder

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Pricing

from $1.00 / 1,000 domain auditeds

Rating

0.0

(0)

Developer

πŸ‘ Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 months ago

Last modified

Share

Robots.txt Auditor

Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections β€” this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.

Features

  • Bulk auditing β€” process hundreds of domains in a single run with configurable concurrency
  • Sitemap discovery β€” extracts all Sitemap: directives declared in each robots.txt
  • User-agent analysis β€” identifies every crawler-specific rule block in the file
  • Status reporting β€” captures HTTP status codes, file size, and fetch errors
  • Flexible input β€” accepts full URLs or bare domains (auto-resolves to /robots.txt)
  • Error resilience β€” reports failures per domain without stopping the run
  • Timestamp tracking β€” records when each domain was checked for audit trails

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesβ€”List of URLs or domains to audit (e.g. example.com or https://example.com)
urlstringNoβ€”Single URL for backward compatibility. Merged into urls if both are provided.
timeoutSecondsintegerNo15HTTP timeout in seconds for each robots.txt fetch
concurrencyintegerNo10Number of domains to process in parallel (1-50)

Input Example

{
"urls":["https://crawlee.dev","https://apify.com","https://example.com"],
"timeoutSeconds":15,
"concurrency":10
}

Output

Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.

  • inputUrl (string) β€” the original URL or domain you provided
  • robotsUrl (string | null) β€” the resolved robots.txt URL
  • status (number | null) β€” HTTP status code (200, 404, etc.)
  • contentLength (number) β€” file size in bytes
  • sitemapCount (number) β€” number of Sitemap: directives found
  • sitemaps (string[]) β€” list of sitemap URLs declared in the file
  • userAgents (string[]) β€” list of unique User-agent values
  • error (string | null) β€” error message if the fetch failed
  • checkedAt (string) β€” ISO timestamp of when the check ran

Output Example

{
"inputUrl":"https://crawlee.dev",
"robotsUrl":"https://crawlee.dev/robots.txt",
"status":200,
"contentLength":342,
"sitemapCount":2,
"sitemaps":[
"https://crawlee.dev/sitemap.xml",
"https://crawlee.dev/sitemap-blog.xml"
],
"userAgents":["*","Googlebot","AhrefsBot"],
"error":null,
"checkedAt":"2025-11-20T14:30:00.000Z"
}

Pricing

EventCost
Domain Audited$0.001 per domain

You are charged per domain audited. Platform usage fees apply separately.

Use Cases

  • SEO audits β€” check whether robots.txt accidentally blocks important pages or crawlers
  • Sitemap discovery β€” extract all declared sitemap URLs across a portfolio of domains
  • Competitor intelligence β€” see which crawlers competitors specifically block or allow
  • Migration validation β€” verify robots.txt is correctly configured after a domain migration
  • Agency reporting β€” audit robots.txt across all client domains in a single scheduled run

Related Actors

ActorWhat it adds
XML Sitemap URL ExtractorExtract all URLs from the sitemaps discovered in robots.txt
Broken Links CheckerCrawl your site to find broken links that robots.txt might be masking
Tech Stack AnalyzerDetect the CMS and frameworks behind the domains you audit

You might also like

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

πŸ‘ User avatar

Stas Persiianenko

16

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

πŸ‘ User avatar

Maxime DuprΓ©

2

Website Metadata Extractor (meta tags, sitemap, robots) πŸ”Ž

powerful_bachelor/website-metadata-extractor

πŸ” Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. πŸ“Š Analyze SEO elements, crawler directives, and site structure. βœ… Perfect for SEO audits, πŸ”Ž competitor research, and πŸš€ understanding how search engines view your website.

πŸ‘ User avatar

Powerful Bachelor

7

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

πŸ‘ User avatar

Maxime DuprΓ©

2