VOOZH about

URL: https://apify.com/predictable_function/my-actor-3

⇱ Robots.txt Validator Β· Apify


Pricing

Pay per usage

Go to Apify Store

Robots.txt Validator

List of website base URLs whose robots.txt files will be validated

Pricing

Pay per usage

Rating

5.0

(1)

Developer

πŸ‘ riya rawat

riya rawat

Maintained by Community

Actor stats

0

Bookmarked

66

Total users

2

Monthly active users

5 months ago

Last modified

Categories

Share

JavaScript Website Scraper (Crawlee + CheerioCrawler)

A lightweight and efficient Apify Actor for scraping static and semi-dynamic websites using Crawlee with CheerioCrawler. This Actor extracts page titles and URLs from provided start pages and stores them in an Apify dataset.

Designed to be fast, easy to customize, and fully compliant with Apify Actor Store rules and the Apify $1 Million Challenge.


Key Features

  • Fast HTML parsing using Cheerio (no browser required)
  • Built on Crawlee’s CheerioCrawler
  • Supports Apify Proxy rotation to reduce blocking
  • Clearly defined input and output
  • Stores structured data in Apify Datasets
  • Easy to extend for custom scraping use cases

Quick Start

Run the Actor locally:

$apify run

You might also like

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

Robots.txt Generator

maximedupre/robots-txt-generator

Generate deployable robots.txt files from presets, custom bot rules, sitemap URLs, and host directives. Create one file or batch files for multiple sites, then export raw text plus validation data.

πŸ‘ User avatar

Maxime DuprΓ©

2

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

Robots.txt Generator

automation-lab/robots-txt-generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

πŸ‘ User avatar

Stas Persiianenko

4

Robots.txt Monitor

datawinder/robots-txt-monitor

Stateful robots.txt monitoring with baseline awareness and severity-classified alerts. Detects meaningful policy changes over time β€” not noisy diffs.

πŸ‘ User avatar

DatawinderLabs

2

Indexability Audit

zerobreak/indexability-audit

Indexability audit tool that checks robots.txt, meta robots tags, X-Robots-Tag headers, and canonical URLs for any list of pages, so SEO teams know which ones Google can actually crawl and index.