👁 Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives avatar

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Pricing

$4.99/month + usage

👁 Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

👁 Bikram Adhikari

Bikram Adhikari

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

Robots.txt Validator (SEO + Crawling Rules Checker)

Validate robots.txt for one or more websites.

This Actor:

Fetches /robots.txt for each unique host derived from startUrls
Parses directive groups (User-agent, Allow, Disallow, Crawl-delay) and extracts Sitemap URLs
Reports common errors/warnings (invalid lines, unknown directives, rules before User-agent, invalid sitemap URLs, etc.)
Optionally tests a list of URLs against the selected User-Agent

Typical use cases

SEO audits: verify Sitemap: entries and robots configuration
QA checks: catch malformed directives before a production release
Crawl planning: see whether important URLs are blocked for a given bot

Input

startUrls (required): any URLs on the target site(s)
userAgent (default *): used to choose the best matching group
testUrls (optional): URLs to evaluate as allowed/disallowed for the chosen userAgent
requestTimeoutSecs (default 15)
maxRobotsTxtBytes (default 500000)
fallbackToHttp (default true)
saveRawRobotsTxt (default false): stores robots-<hostname>.txt in key-value store
proxyConfiguration (optional)

Output

Dataset items (one per host)

Each item includes:

hostname, robotsTxtUrl, statusCode, hasRobotsTxt, contentType, bytes, sha256
selectedGroupUserAgents, crawlDelaySeconds, sitemapUrls
errors[] and warnings[] (with code, message, line)
testedUrls[] (if provided)

Key-value store

REPORT (JSON): full per-host report array
SUMMARY (JSON): run summary and counts
robots-<hostname>.txt (text, optional): raw robots.txt

Notes

If /robots.txt returns 404, it is treated as allow-all (with a warning)
This Actor is designed for validation and QA checks (not a full crawler)

SEO keywords

robots.txt validator, robots.txt checker, validate robots.txt, robots rules tester, sitemap directive checker, crawl-delay validator, allow disallow rules

Quick start

Store page: https://apify.com/scrappy_garden/robots-txt-validator

Paste this into Input and click Run:

{
"startUrls":[
{
"url":"https://example.com/"
}
],
"proxyConfiguration":{
"useApifyProxy":false
}
}

Outputs (what you get)

Dataset: Dataset items typically include fields like: hostname, robotsTxtUrl, statusCode, hasRobotsTxt, crawlDelaySeconds, sitemapUrls, errors, warnings.
Key-value store: REPORT, SUMMARY

Tips (trust + predictable results)

Start with 1–3 URLs to validate behavior, then scale up.
If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

Related actors

sitemap-generator (https://apify.com/scrappy_garden/sitemap-generator)
canonical-url-checker (https://apify.com/scrappy_garden/canonical-url-checker)
broken-link-checker (https://apify.com/scrappy_garden/broken-link-checker)
security-headers-checker (https://apify.com/scrappy_garden/security-headers-checker)

Search keywords

robots txt validator, robots.txt validator - check rules, sitemaps & crawl directives, website audit, seo, robots.txt

Robots.txt Validator - Crawl Rules Analyzer

pink_comic/robots-txt-validator

Analyze robots.txt files for any domain. Extract crawl rules, sitemaps, blocked paths, and crawl-delay settings. Validate configuration and identify SEO issues in bulk.

👁 User avatar

Ava Torres

👁 Robots.txt & Sitemap Analyzer avatar

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

👁 User avatar

Stas Persiianenko

robots.txt Parser & URL Tester

scrapeworks/robots-txt

Fetch and parse robots.txt for any site: user-agent rules, crawl-delay, and declared sitemaps. Optionally test whether specific URLs are allowed for a given user-agent, using correct longest-match rules.

👁 User avatar

Nicolas van Arkens

👁 Robots Txt Analyzer avatar

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

👁 User avatar

ZeroBreak

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

👁 User avatar

Andok

👁 Robots.txt Generator avatar

Robots.txt Generator

automation-lab/robots-txt-generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

👁 User avatar

Stas Persiianenko

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

👁 User avatar

riya rawat

5.0

👁 Robots.txt Generator avatar

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

👁 User avatar

Maxime Dupré

👁 robots.txt Parser & AI Crawler Block Checker avatar

robots.txt Parser & AI Crawler Block Checker

taroyamada/robotstxt-ai-checker

robots.txt parser that audits AI crawler block rules (GPTBot, ClaudeBot, anthropic-ai, PerplexityBot) across thousands of websites in one run. Returns per-bot allow/disallow disposition and crawl-delay.

👁 User avatar

naoki anzai

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

URL: https://apify.com/scrappy_garden/robots-txt-validator

⇱ Robots.txt Validator · Apify

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Robots.txt Validator (SEO + Crawling Rules Checker)

Typical use cases

Input

Output

Dataset items (one per host)

Key-value store

Notes

SEO keywords

Quick start

Outputs (what you get)

Tips (trust + predictable results)

Related actors

Search keywords

You might also like

Robots.txt Validator - Crawl Rules Analyzer

Robots.txt & Sitemap Analyzer

robots.txt Parser & URL Tester

Robots Txt Analyzer

Robots.txt Auditor & Sitemap Finder

Robots.txt Generator

Robots.txt Validator

Robots.txt Generator

robots.txt Parser & AI Crawler Block Checker

Sitemap Extractor