Find Broken Links

Pricing

from $1.00 / 1,000 results

Find Broken Links

Crawl a website (start URL + same-host pages up to a configurable depth) and report every link that returns a 4xx / 5xx status, times out, or has a DNS error. HTTP-only - no proxy or browser needed.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

What it does

You give it a start URL; the actor crawls the start page (and optionally same-host internal links up to a depth N), gathers every <a href>, and probes each one with HEAD (falling back to GET when servers reject HEAD). Records are emitted only for links that fail.

The dataset is never empty — even a perfectly-clean site gets a final summary record with run statistics.

Input

Field	Type	Default	Description
`startUrl`	string (required)	`https://apify.com`	Page to start crawling from. Must be `http://` or `https://`.
`maxCrawlDepth`	integer	`1` (0–5)	0 = check links on start URL only; 1+ = follow internal links one level and check theirs too.
`maxPages`	integer	`50` (1–5000)	Hard cap on pages crawled.
`checkExternalLinks`	boolean	`true`	Also probe links that leave the start URL's host.
`verifyWithProxy`	boolean	`true`	When a link returns `401 / 403 / 405 / 429 / 451` (typical anti-bot signals), retry once via Apify residential proxy. If the proxy retry succeeds the link is treated as OK — eliminates false positives from sites that block datacenter IPs (G2, Capterra, etc.). Turn off to skip the retry pass.
`maxConcurrency`	integer	`10` (1–50)	Concurrent HEAD/GET requests during the check phase.
`userAgent`	string (optional)	(Chrome 131)	Override only if a target server filters by UA.

Example input

{
"startUrl":"https://apify.com",
"maxCrawlDepth":1,
"maxPages":50,
"checkExternalLinks":true,
"maxConcurrency":10
}

Output

Broken-link record (one per failure)

{
"url":"https://example.com/old-blog-post",
"sourcePage":"https://apify.com/blog/index",
"anchorText":"Read more",
"linkType":"external",
"linkDomain":"example.com",
"isExternalLink":true,
"httpStatus":404,
"errorReason":"not_found",
"proxyRecheckStatus":404,
"scrapedAt":"2024-12-16T14:23:11+00:00"
}

Summary record (always emitted last)

{
"_recordType":"summary",
"startUrl":"https://apify.com",
"pagesCrawled":12,
"linksDiscovered":480,
"linksChecked":480,
"brokenCount":3,
"okCount":477,
"breakdown":{"not_found":2,"server_error":1},
"maxCrawlDepth":1,
"checkExternalLinks":true,
"scrapedAt":"2024-12-16T14:23:18+00:00"
}

Output fields

url — the broken link's absolute URL.
sourcePage — page where the link was first discovered.
anchorText — visible text of the <a> element (when present).
linkType — "internal" (same host as start URL) or "external".
linkDomain — derived hostname of the broken url (lowercase, includes any port).
isExternalLink — derived boolean: true when the broken link's host differs from sourcePage's host.
httpStatus — HTTP status code (omitted for network errors / timeouts).
errorReason — one of:
- not_found (404), gone (410), forbidden (403), unauthorized (401), server_error (5xx), client_error_<NNN> (other 4xx)
- timeout, dns_error, connection_refused, tls_error, redirect_loop, network_error
proxyRecheckStatus — only present when verifyWithProxy: true triggered a retry. Shows the status returned via residential proxy (use this to distinguish real broken links from anti-bot blocks).
scrapedAt — ISO-8601 timestamp.

Use cases

SEO audits — every broken link costs link equity and damages user trust.
Site migration validation — after a CMS move, find the URLs that didn't get redirected.
Editorial QA — catch dead links in blog content, reference pages, footer navigation.
Internal-tools health — spot broken links to deprecated wikis, retired tools, expired SSO redirects.

FAQ

Does it need a proxy? For the bulk crawl, no — the actor uses curl_cffi with a Chrome User-Agent from a datacenter IP. Optionally, when verifyWithProxy: true (default), any link that returns 401 / 403 / 405 / 429 / 451 is retried once via Apify residential proxy. If that retry succeeds, the link is treated as OK — this eliminates the false positives that used to surface from sites like G2, Capterra, or rate-limited APIs. The retried status is surfaced as proxyRecheckStatus so you can see both checks.

HEAD vs GET — which is used? HEAD first (saves bandwidth). If a server returns 405 or 501, the actor falls back to GET and uses that status instead.

Will it follow redirects? Yes — allow_redirects=True for both HEAD and GET. The final status is what gets recorded.

Can I limit it to internal links only? Set checkExternalLinks: false. The actor still walks the same-host graph for discovery but only probes internal links.

Why is the dataset never empty? Even when no broken links are found, a _recordType: "summary" record is emitted with run stats. This keeps Apify's daily-test happy and gives you a quick health pulse for the site.

My start URL has thousands of pages — will this finish in time? Use maxPages and maxCrawlDepth to keep runs bounded. For large sites, consider running with maxCrawlDepth: 0 first to audit the start page's links, then expand outward.

The summary says brokenCount: 0 but I know some links are dead.

The link may use a non-HTTP scheme (mailto, javascript:, data:) — those aren't checkable.
The link may be JS-rendered (this scraper sees only server-rendered HTML).
The target may serve different content / status to its own site than to a generic crawler — try with the site's own User-Agent via userAgent.

👁 Website Broken Links & Redirects Checker avatar

Website Broken Links & Redirects Checker

smart-digital/website-broken-links-redirects-checker

Analyzes websites to detect broken links (4xx/5xx) and redirects (3xx). Checks internal/external links on single pages or crawls entire sites. Provides detailed reports per page and site summary.

My Smart Digital

5.0

HTTP Status Code Checker API - Bulk URL & Redirect Audit

pink_comic/http-status-code-checker

Bulk HTTP status checker for SEO and uptime audits. Check URL status codes, redirects, response times, content types, and broken 4xx/5xx pages. Use it as a URL status API for site migrations, monitoring, and link QA.

👁 User avatar

Ava Torres

Dead Link Crawler

actually_good_at_this/dead-link-crawler

Scans any website and identifies broken links (4xx and 5xx status codes). Allows to find and fix broken links, perform SEO audits, identify orphaned pages and server errors, ensure all links work before going live, analyse competitors and discover what's broken on competitor sites.

👁 User avatar

john Y

Broken Link Crawler

pattonholdings/broken-link-crawler

Crawl a site, find every broken link, return one row per broken link with full referrer trail. Fetch-only (no headless browser) for speed and predictable cost. Configurable depth + external link inclusion.

👁 User avatar

Coleton Patton

👁 Website Image Scraper avatar

Website Image Scraper

crawlerbros/website-image-scraper

Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), parses `<img>` tags, `<picture>`/`<source>`, `srcset` candidates, and CSS `background-image` declarations. HTTP-only, no proxy or browser needed.

👁 User avatar

Crawler Bros

a

tan_asp/danish-grocery-scraper

👁 User avatar

jens

👁 Broken Link Checker — Recursive Site Crawler avatar

Broken Link Checker — Recursive Site Crawler

accurate_pouch/broken-link-checker

Recursively crawl your website and find every broken link, 404, redirect, and timeout. Checks internal and external links with configurable depth. 100 links free per run.

👁 User avatar

Manchitt Sanan

👁 Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

maximedupre/website-url-crawler

Crawl JavaScript-rendered websites and export a URL link map. Get source pages, depth, anchor text, link type, HTTP metadata, and crawl status.

👁 User avatar

Maxime Dupré

👁 Broken Link Checker — Find 404s, Dead Links & Redirect Issues avatar

Broken Link Checker — Find 404s, Dead Links & Redirect Issues

khadinakbar/broken-link-checker

Crawl a website, scan a URL list, or verify all URLs from a sitemap. Returns broken links with source page, anchor text, status, redirect chain, and failure class — for SEO audits, content QA, and migration validation.

👁 User avatar

Khadin Akbar

Broken Link Checker - Website Link Validator & 404 Finder

scrappy_garden/broken-link-checker

Crawl a website (or list of pages) and detect broken links (404/500), unreachable URLs, and invalid asset references. Generates a structured report for SEO audits, QA testing, and website maintenance.

👁 User avatar

Bikram Adhikari

👁 Blog article image

Error code 1020: Why Cloudflare blocks you and how to fix it

👁 Blog article image

Web scraping: how to solve 403 errors

👁 Blog article image

What is a proxy server?

URL: https://apify.com/crawlerbros/find-broken-links