Pricing
from $1.00 / 1,000 domain auditeds
Robots.txt Auditor & Sitemap Finder
Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.
Pricing
from $1.00 / 1,000 domain auditeds
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 months ago
Last modified
Categories
Share
Robots.txt Auditor
Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections β this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.
Features
- Bulk auditing β process hundreds of domains in a single run with configurable concurrency
- Sitemap discovery β extracts all
Sitemap:directives declared in each robots.txt - User-agent analysis β identifies every crawler-specific rule block in the file
- Status reporting β captures HTTP status codes, file size, and fetch errors
- Flexible input β accepts full URLs or bare domains (auto-resolves to
/robots.txt) - Error resilience β reports failures per domain without stopping the run
- Timestamp tracking β records when each domain was checked for audit trails
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | β | List of URLs or domains to audit (e.g. example.com or https://example.com) |
url | string | No | β | Single URL for backward compatibility. Merged into urls if both are provided. |
timeoutSeconds | integer | No | 15 | HTTP timeout in seconds for each robots.txt fetch |
concurrency | integer | No | 10 | Number of domains to process in parallel (1-50) |
Input Example
{"urls":["https://crawlee.dev","https://apify.com","https://example.com"],"timeoutSeconds":15,"concurrency":10}
Output
Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.
inputUrl(string) β the original URL or domain you providedrobotsUrl(string | null) β the resolved robots.txt URLstatus(number | null) β HTTP status code (200, 404, etc.)contentLength(number) β file size in bytessitemapCount(number) β number ofSitemap:directives foundsitemaps(string[]) β list of sitemap URLs declared in the fileuserAgents(string[]) β list of uniqueUser-agentvalueserror(string | null) β error message if the fetch failedcheckedAt(string) β ISO timestamp of when the check ran
Output Example
{"inputUrl":"https://crawlee.dev","robotsUrl":"https://crawlee.dev/robots.txt","status":200,"contentLength":342,"sitemapCount":2,"sitemaps":["https://crawlee.dev/sitemap.xml","https://crawlee.dev/sitemap-blog.xml"],"userAgents":["*","Googlebot","AhrefsBot"],"error":null,"checkedAt":"2025-11-20T14:30:00.000Z"}
Pricing
| Event | Cost |
|---|---|
| Domain Audited | $0.001 per domain |
You are charged per domain audited. Platform usage fees apply separately.
Use Cases
- SEO audits β check whether robots.txt accidentally blocks important pages or crawlers
- Sitemap discovery β extract all declared sitemap URLs across a portfolio of domains
- Competitor intelligence β see which crawlers competitors specifically block or allow
- Migration validation β verify robots.txt is correctly configured after a domain migration
- Agency reporting β audit robots.txt across all client domains in a single scheduled run
Related Actors
| Actor | What it adds |
|---|---|
| XML Sitemap URL Extractor | Extract all URLs from the sitemaps discovered in robots.txt |
| Broken Links Checker | Crawl your site to find broken links that robots.txt might be masking |
| Tech Stack Analyzer | Detect the CMS and frameworks behind the domains you audit |
