VOOZH about

URL: https://apify.com/accurate_pouch/broken-link-checker

โ‡ฑ Broken Link Checker โ€” Recursive Site Crawler ยท Apify


๐Ÿ‘ Broken Link Checker โ€” Recursive Site Crawler avatar

Broken Link Checker โ€” Recursive Site Crawler

Pricing

$5.00 / 1,000 link checkeds

Go to Apify Store

Broken Link Checker โ€” Recursive Site Crawler

Recursively crawl your website and find every broken link, 404, redirect, and timeout. Checks internal and external links with configurable depth. 100 links free per run.

Pricing

$5.00 / 1,000 link checkeds

Rating

0.0

(0)

Developer

๐Ÿ‘ Manchitt Sanan

Manchitt Sanan

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

a month ago

Last modified

Share

Find every broken link on your website. Recursively crawl from any start URL and report all 404 errors, bad redirects, timeouts, and server errors โ€” with the exact page and anchor text where each broken link was found.


Why this exists

Broken links hurt your SEO rankings, frustrate visitors, and make your site look unmaintained. Manually checking links on a 500-page site takes hours. This actor crawls your entire site in minutes, checks every internal and external link, and gives you a structured report.

  • Recursive crawling โ€” follows internal links up to a configurable depth, not just one page
  • External link checking โ€” lightweight HEAD requests to verify links to other domains
  • Status categorization โ€” every link classified as broken (404/410/5xx), redirect (301/302), timeout, or OK
  • Severity levels โ€” critical (404, 5xx), warning (redirects, timeouts), info (working links)
  • Context โ€” shows which page the broken link was found on and what the anchor text says
  • 100 links โ€” try it on your site with zero risk

Quick start

{
"startUrl":"https://example.com",
"maxDepth":3,
"maxPages":500,
"checkExternalLinks":true
}

Hit Start and get a full report in minutes.


Feature comparison

FeatureHTTP Status CheckerparseforgeThis actor
Single URL checkYesYesYes
Recursive site crawlNoYesYes
External link checkingNoYesYes
Status categorizationNoBasic404/301/302/500/timeout
Severity classificationNoNocritical / warning / info
Anchor text contextNoNoYes
Source page trackingNoYesYes
Configurable depthNoYesYes
Configurable max pagesNoYesYes
Respect robots.txtNoNoYes (configurable)
URL pattern exclusionNoNoYes (glob patterns)
Dry run modeNoNoYes
Free tierNoNo100 links free

Input

FieldTypeDefaultDescription
startUrlstring(required)URL to start crawling from
maxDepthinteger3Maximum link depth to follow (1โ€“10)
maxPagesinteger500Maximum pages to crawl (1โ€“10,000)
checkExternalLinksbooleantrueCheck links pointing to other domains
respectRobotsTxtbooleantrueSkip pages disallowed by robots.txt
ignoredPatternsarray[]URL patterns to skip (glob-style: *logout*, *admin*)
outputFormatenumbroken-onlybroken-only or all
sitemapUrlstring(auto-detect)URL to sitemap.xml. If not set, auto-checks /sitemap.xml and /sitemap_index.xml
webhookUrlstring(optional)POST full JSON results to this URL when audit completes
googleSheetsIdstring(optional)Export broken links to this Google Sheet (spreadsheet ID)
googleServiceAccountKeystring(optional)Google Service Account JSON key for Sheets export
dryRunbooleanfalsePreview what would be crawled โ€” no charges

Output

{
"status":"success",
"startUrl":"https://example.com",
"summary":{
"pagesChecked":142,
"linksChecked":1847,
"brokenLinks":12,
"redirects":34,
"errors":3
},
"brokenLinks":[
{
"url":"https://example.com/old-page",
"statusCode":404,
"statusCategory":"broken",
"severity":"critical",
"foundOn":"https://example.com/blog/post-1",
"anchorText":"Learn more",
"lastChecked":"2026-04-13T10:30:00Z",
"error":null
}
]
}

Status categories

CategoryHTTP codesSeverityMeaning
broken404, 410, 5xxcriticalLink target is dead or server is failing
redirect301, 302, 303, 307, 308warningLink works but goes through a redirect โ€” consider updating
timeoutโ€”warningServer did not respond within 10 seconds
errorโ€”criticalNetwork error, DNS failure, or connection refused
ok2xxinfoLink is working (only shown in all output mode)

Pricing

$0.003 per link checked (pay-per-event pricing).

  • Only charged on successful runs โ€” errors and dry runs are never charged.
  • 500 links = $1.50
  • 2,000 links = $6.00

Performance

  • Uses CheerioCrawler (pure HTTP) โ€” no headless browser overhead
  • Default concurrency handled by Crawlee's built-in request queue
  • External links checked with parallel HEAD requests (batches of 20)
  • Typical: 200โ€“500 links/minute depending on target site response time
  • 10-second timeout per request, 1 retry on failure

Limitations

  • JavaScript-rendered links are not detected. This actor uses HTTP requests only (CheerioCrawler), not a headless browser. Links injected by JavaScript after page load will be missed.
  • Some sites aggressively block crawlers. If you see many timeouts, try reducing maxConcurrency or disabling checkExternalLinks.
  • External links are checked with HEAD requests only. Some servers respond differently to HEAD vs GET โ€” a HEAD 404 does not always mean GET would also 404.
  • Maximum 10,000 pages per run to prevent runaway costs.

Related actors in this suite

Other tools by accurate_pouch for web intelligence + automation:

  • TheCrawler โ€” Web scraper + LLM-powered structured extraction. AGPL-3.0, also on npm (thecrawler@0.1.1). $0.005/page.
  • Sitemap Analyzer โ€” Parse and validate XML sitemaps, status-check every URL, handle sitemap indexes. $0.004/sitemap.
  • Website Change Monitor โ€” Track page changes via text, hash, or CSS selector; diff + webhook on change. $0.005/page.
  • Lighthouse Auditor โ€” PageSpeed Insights API, Core Web Vitals, deltas, competitor comparison, Sheets export. $0.005/audit.
  • Tech Stack Detector โ€” 7,517 signatures across 105 categories, implies chains. $0.02/URL.

Run on Apify

๐Ÿ‘ Run on Apify

No setup needed. Click above to run in the cloud. $0.003 per operation.

You might also like

Broken Link Checker - Find Dead 404 Links

logiover/broken-link-checker

Site-wide broken link checker: crawl any website, find 404 and dead links, export the link audit to CSV or JSON with source page and status code.

Broken Link Checker

automation-lab/broken-link-checker

Broken Link Checker crawls your website, discovers all internal and external links, and verifies each one. It finds 404 errors, server errors, timeouts, and other broken links โ€” then tells you exactly which page links to each broken URL and what the anchor text says.

๐Ÿ‘ User avatar

Stas Persiianenko

24

Bulk URL Status Checker โ€“ Broken Link & Redirect Audit

logiover/bulk-url-status-checker

Bulk HTTP status code checker and broken link checker. Trace redirect chains, find 404s, export to CSV/JSON. No browser, no login.

Broken Link Checker - Find 404s and Dead Links

santamaria-automations/broken-link-checker

Crawl any website and find broken links, 404 errors, redirect chains, timeouts, and SSL failures. Essential for SEO audits, QA, and content maintenance. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Broken Link Checker โ€” Find 404s, Dead Links & Redirect Issues

khadinakbar/broken-link-checker

Crawl a website, scan a URL list, or verify all URLs from a sitemap. Returns broken links with source page, anchor text, status, redirect chain, and failure class โ€” for SEO audits, content QA, and migration validation.

Broken Link Checker - Ensure Your Website's Integrity

dainty_screw/find-broken-links-of-your-website

Maintain your website's health and user experience with our Broken Link Checker. Easily identify and fix broken links to enhance your site's navigation, improve SEO, and keep visitors engaged.

๐Ÿ‘ User avatar

codemaster devops

35

5.0