VOOZH about

URL: https://apify.com/pink_comic/robots-txt-validator

โ‡ฑ Robots.txt Validator - Crawl Rules Analyzer ยท Apify


๐Ÿ‘ Robots.txt Validator - Crawl Rules Analyzer avatar

Robots.txt Validator - Crawl Rules Analyzer

Pricing

from $1.00 / 1,000 results

Go to Apify Store

Robots.txt Validator - Crawl Rules Analyzer

Analyze robots.txt files for any domain. Extract crawl rules, sitemaps, blocked paths, and crawl-delay settings. Validate configuration and identify SEO issues in bulk.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Ava Torres

Ava Torres

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 months ago

Last modified

Share

robots.txt Validator & Analyzer

Fetch, parse, and analyze robots.txt files for any domain in bulk. Built for SEO professionals, developers, and crawler operators who need to audit site access rules at scale.

What It Does

For each domain you supply, the actor:

  1. Fetches /robots.txt from the domain root over HTTPS (falls back gracefully on 404 or network errors)
  2. Parses all User-agent, Allow, Disallow, Crawl-delay, and Sitemap directives
  3. Reports structured rules grouped by user-agent
  4. Optionally checks whether specific paths are allowed or blocked for your chosen user-agent

Input

FieldTypeRequiredDescription
urlsstring[]YesDomains or full URLs (e.g. google.com, https://openai.com/blog)
userAgentstringNoUser-agent to evaluate rules for. Defaults to *
checkPathsstring[]NoSpecific paths to test for allow/disallow (e.g. /admin, /api/)
maxResultsintegerNoCap on domains to process. Defaults to 100

Output

One record per domain:

FieldDescription
domainDomain name
robotsTxtUrlFull URL of the fetched robots.txt
robotsTxtFoundtrue if HTTP 200 was returned
robotsTxtContentRaw robots.txt text
userAgentRulesParsed rule blocks, each with userAgent and rules array of {directive, path}
sitemapUrlsAll Sitemap URLs declared in the file
crawlDelayCrawl-delay in seconds for the requested user-agent (null if not set)
analyzedPathsPer-path results: {path, allowed} for each path in checkPaths
fetchErrorError message if the file could not be fetched

Example Use Cases

  • SEO audit: Check which bots can access which parts of your site
  • Crawler compliance: Verify your spider respects Disallow rules before running at scale
  • Competitive research: Understand what paths competitors block from indexing
  • Security review: Identify paths hidden from crawlers (admin panels, staging URLs)
  • Sitemap discovery: Extract all declared sitemap URLs without manual inspection

Pricing

$0.10 per 1,000 domains checked. Typical run of 100 domains costs less than $0.02.

You might also like

Robots.txt Generator

automation-lab/robots-txt-generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

๐Ÿ‘ User avatar

Stas Persiianenko

4

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

๐Ÿ‘ User avatar

Stas Persiianenko

16

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

๐Ÿ‘ User avatar

Maxime Duprรฉ

2

robots.txt Parser & AI Crawler Block Checker

taroyamada/robotstxt-ai-checker

robots.txt parser that audits AI crawler block rules (GPTBot, ClaudeBot, anthropic-ai, PerplexityBot) across thousands of websites in one run. Returns per-bot allow/disallow disposition and crawl-delay.