VOOZH about

URL: https://apify.com/scrappy_garden/robots-txt-validator

โ‡ฑ Robots.txt Validator ยท Apify


๐Ÿ‘ Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives avatar

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Pricing

$4.99/month + usage

Go to Apify Store

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Bikram Adhikari

Bikram Adhikari

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

5 months ago

Last modified

Categories

Share

Robots.txt Validator (SEO + Crawling Rules Checker)

Validate robots.txt for one or more websites.

This Actor:

  • Fetches /robots.txt for each unique host derived from startUrls
  • Parses directive groups (User-agent, Allow, Disallow, Crawl-delay) and extracts Sitemap URLs
  • Reports common errors/warnings (invalid lines, unknown directives, rules before User-agent, invalid sitemap URLs, etc.)
  • Optionally tests a list of URLs against the selected User-Agent

Typical use cases

  • SEO audits: verify Sitemap: entries and robots configuration
  • QA checks: catch malformed directives before a production release
  • Crawl planning: see whether important URLs are blocked for a given bot

Input

  • startUrls (required): any URLs on the target site(s)
  • userAgent (default *): used to choose the best matching group
  • testUrls (optional): URLs to evaluate as allowed/disallowed for the chosen userAgent
  • requestTimeoutSecs (default 15)
  • maxRobotsTxtBytes (default 500000)
  • fallbackToHttp (default true)
  • saveRawRobotsTxt (default false): stores robots-<hostname>.txt in key-value store
  • proxyConfiguration (optional)

Output

Dataset items (one per host)

Each item includes:

  • hostname, robotsTxtUrl, statusCode, hasRobotsTxt, contentType, bytes, sha256
  • selectedGroupUserAgents, crawlDelaySeconds, sitemapUrls
  • errors[] and warnings[] (with code, message, line)
  • testedUrls[] (if provided)

Key-value store

  • REPORT (JSON): full per-host report array
  • SUMMARY (JSON): run summary and counts
  • robots-<hostname>.txt (text, optional): raw robots.txt

Notes

  • If /robots.txt returns 404, it is treated as allow-all (with a warning)
  • This Actor is designed for validation and QA checks (not a full crawler)

SEO keywords

robots.txt validator, robots.txt checker, validate robots.txt, robots rules tester, sitemap directive checker, crawl-delay validator, allow disallow rules

Quick start

Store page: https://apify.com/scrappy_garden/robots-txt-validator

Paste this into Input and click Run:

{
"startUrls":[
{
"url":"https://example.com/"
}
],
"proxyConfiguration":{
"useApifyProxy":false
}
}

Outputs (what you get)

  • Dataset: Dataset items typically include fields like: hostname, robotsTxtUrl, statusCode, hasRobotsTxt, crawlDelaySeconds, sitemapUrls, errors, warnings.
  • Key-value store: REPORT, SUMMARY

Tips (trust + predictable results)

  • Start with 1โ€“3 URLs to validate behavior, then scale up.
  • If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
  • Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

Related actors

Search keywords

robots txt validator, robots.txt validator - check rules, sitemaps & crawl directives, website audit, seo, robots.txt

You might also like

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

๐Ÿ‘ User avatar

Stas Persiianenko

16

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

Robots.txt Generator

automation-lab/robots-txt-generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

๐Ÿ‘ User avatar

Stas Persiianenko

4

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

๐Ÿ‘ User avatar

Maxime Duprรฉ

2

robots.txt Parser & AI Crawler Block Checker

taroyamada/robotstxt-ai-checker

robots.txt parser that audits AI crawler block rules (GPTBot, ClaudeBot, anthropic-ai, PerplexityBot) across thousands of websites in one run. Returns per-bot allow/disallow disposition and crawl-delay.