👁 Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

Pricing

from $10.80 / 1,000 discovered website links

Website URL Crawler & Link Extractor

Crawl JavaScript-rendered websites and export a URL link map. Get source pages, depth, anchor text, link type, HTTP metadata, and crawl status.

Pricing

from $10.80 / 1,000 discovered website links

Rating

0.0

(0)

Developer

👁 Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

🔗 Website URL crawler for rendered pages

Website URL Crawler crawls JavaScript-rendered public websites and exports a clean link map. Add one or more website URLs or domains, and the Actor opens pages in a browser, reads the rendered links, follows the pages you allow, and saves one dataset item per discovered link.

Use it for SEO audits, website migrations, QA checks, broken-link investigation, internal linking reviews, and RAG source inventories. It works well when you need more than a raw list of URLs: each link keeps its source page, parent URL, depth, anchor text, link type, crawl status, and optional HTTP metadata.

For a quick first run, keep the prefilled IANA reserved domains page. It is small, public, and gives you a readable website link map without needing your own test site.

🧭 What this Actor does

Website URL Crawler starts from your submitted websites and discovers links from rendered HTML pages. That means links added by common client-side JavaScript can be included in the crawl output after the page loads.

The Actor can crawl within the same host, within the same registrable domain, or emit external links as discovered-only rows. Only internal page links are followed further. Document and media links can be included or skipped depending on the asset setting you choose.

Each run is designed for link extraction and crawl mapping, not full content scraping. The output helps you answer practical questions such as:

Which pages does this website link to?
Where was each URL found?
What anchor text points to each link?
How deep is the link from the start page?
Is the link internal, external, a document, or an asset?
Which crawled pages returned HTTP status and content type metadata?

📦 Data you get

Every saved item represents one crawled or discovered website link. Fields include:

startUrl - the original website URL this crawl started from
url and normalizedUrl - the discovered link and its normalized version
sourceUrl - the rendered page where the link was found
parentUrl - the page that led to a crawled URL, when available
depth - crawl depth from the start URL
anchorText - visible link text when present
linkType - page, document, asset, or external
crawlStatus - crawled or discovered
httpStatusCode, finalUrl, and contentType - when HTTP status checks are enabled and the page is crawled
isInternal, isExternal, isAsset, and isDuplicate - booleans for filtering and audits
rawHref, foundOnTitle, sourceIndex, and discoveredAt - source evidence and scrape metadata

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML, or consume it through the Apify API, schedules, webhooks, and integrations.

⚙️ How to run it

Add one or more website URLs or domains.
Choose how many pages to crawl per website.
Set the crawl depth and maximum links per page.
Pick whether to stay on the same host, same domain, or include external links as discovered-only rows.
Choose whether to include document links, all asset links, or pages only.
Run the Actor and open the dataset.

Domains such as example.com are accepted and normalized to HTTPS. Full URLs such as https://example.com/docs are also accepted.

🧾 Input options

Website URLs is the only required input. Add the sites you want to crawl.

Max pages per website controls how many HTML pages are opened for each start URL. Discovered links can still be emitted before the page cap is reached.

Max crawl depth controls how many levels of links the Actor follows from the start page. Use 0 when you only want links from the submitted page itself.

Max links per page limits how many rendered links are emitted and considered from each crawled page.

Crawl scope controls which internal links can be followed. External links are never crawled further; they can be emitted as discovered-only rows when your settings allow it.

Asset links controls whether the dataset includes only HTML page links, document links such as PDFs and spreadsheets, or all links including media assets.

Ignored extensions lets you skip common file types unless you choose to include all links.

Check HTTP status adds status code, final URL, and content type for crawled pages.

🧪 Output example

{
"startUrl":"https://www.iana.org/domains/reserved",
"url":"https://www.iana.org/domains/root",
"normalizedUrl":"https://www.iana.org/domains/root",
"sourceUrl":"https://www.iana.org/domains/reserved",
"parentUrl":"https://www.iana.org/domains/reserved",
"depth":1,
"anchorText":"Root Zone Management",
"linkType":"page",
"crawlStatus":"discovered",
"isInternal":true,
"isExternal":false,
"isAsset":false,
"isDuplicate":false,
"rawHref":"/domains/root",
"foundOnTitle":"IANA-managed Reserved Domains",
"sourceIndex":24,
"discoveredAt":"2026-05-26T00:00:00.000Z"
}

💳 Pricing

This Actor uses pay-per-event pricing. You are charged for each saved website link item. The pricing event is called Discovered website link.

Use a small Max pages per website value for your first run, then increase the limit once the output shape looks right.

⚠️ Limits and caveats

Website URL Crawler is browser-rendered, so it is designed for capability over minimum runtime cost. Large sites can produce many links quickly; start with a small page limit and expand from there.

The Actor reads links from public rendered pages. It does not log in, submit forms, click through interactive menus, or guarantee that every route in a single-page app is discoverable from normal anchor links.

HTTP status, final URL, and content type are available for crawled pages. Links that are only discovered but not crawled are still useful for mapping, but they may not have those HTTP fields.

❓ FAQ

🌐 Does this crawl JavaScript-rendered websites?

Yes. Pages are opened in a browser and links are extracted from the rendered page, not only the initial HTML response.

🌍 Will it crawl external websites too?

No. External links can be saved as discovered links, but the crawler only follows internal page links within the scope you choose.

📄 Can I crawl only one page?

Yes. Set Max crawl depth to 0 when you want links from the submitted page without following deeper links.

🧯 Is this a broken link checker?

It can help with broken-link workflows by exporting discovered links and HTTP metadata for crawled pages, but the core output is a website URL crawl map.

📝 Changelog

0.0: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Website Emails Scraper ↗ - Find public email addresses on the websites you already crawl.
Font Detector ↗ - Audit fonts, font files, and typography metadata from public pages.
Business Address Scraper ↗ - Extract physical business addresses from company websites.
Product Hunt Scraper ↗ - Build startup lead lists and enrich launches with website details.
LinkedIn Company Scraper ↗ - Export public company profile data for lead and market research.

Made with ❤️ by Maxime Dupré

👁 Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

maged120/get-urls-pro

Crawl any website and extract all URLs with full hierarchy — depth, parent URL, and anchor text. Supports static and JavaScript-rendered sites. Configurable depth and domain filtering.

👁 User avatar

Maged

👁 Website Link Extractor — List All URLs from Any Page avatar

Website Link Extractor — List All URLs from Any Page

maged120/get-urls

Extract all links from any web page. Returns every URL found with anchor text and link type — useful for quick link audits, competitor research, or sitemap building.

👁 User avatar

Maged

Broken Link Checker & Scraper - 404 Audit API

pink_comic/broken-link-checker

Scan pages for broken links, dead URLs, 404s, redirects, timeouts, and resource errors. Bulk link checker/scraper for SEO audits, content QA, site migrations, and link-rot monitoring. Returns source URL, link URL, anchor text, status code, broken flag, and error details.

👁 User avatar

Ava Torres

👁 Broken Link Checker avatar

Broken Link Checker

taroyamada/broken-link-checker

Crawl supplied websites to find dead internal and outbound links with status codes, anchor context, redirect hints, and source pages.

👁 User avatar

naoki anzai

👁 Video Download Link Crawler avatar

Video Download Link Crawler

rodrigo91/video-download-link-crawler

Automatically discover and extract video download links from any website. Crawl through multiple pages, follow custom link patterns, and export results in JSON, CSV, HTML, or XML formats. Perfect for content creators, researchers, and media professionals.

👁 User avatar

Rodrigo Franco

👁 Crawl4ai avatar

Crawl4ai

kael_odin/crawl4ai

Extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

👁 User avatar

Kael Odin

👁 Website Link Graph & Outbound Links Crawler avatar

Website Link Graph & Outbound Links Crawler

logiover/website-link-graph-crawler

Extract all links from a website to CSV/JSON. Maps internal & outbound link graph with anchor text + nofollow/rel flags. No API, no login.

👁 User avatar

Logiover

👁 Broken Link Checker - Find Dead 404 Links avatar

Broken Link Checker - Find Dead 404 Links

logiover/broken-link-checker

Site-wide broken link checker: crawl any website, find 404 and dead links, export the link audit to CSV or JSON with source page and status code.

👁 User avatar

Logiover

Payment Link API

vivid_astronaut/payment-link

👁 User avatar

Fabio Suizu

👁 Broken Link Checker — Recursive Site Crawler avatar

Broken Link Checker — Recursive Site Crawler

accurate_pouch/broken-link-checker

Recursively crawl your website and find every broken link, 404, redirect, and timeout. Checks internal and external links with configurable depth. 100 links free per run.

👁 User avatar

Manchitt Sanan

URL: https://apify.com/maximedupre/website-url-crawler

⇱ Website URL Crawler & Link Extractor · Apify

Website URL Crawler & Link Extractor

🔗 Website URL crawler for rendered pages

🧭 What this Actor does

📦 Data you get

⚙️ How to run it

🧾 Input options

🧪 Output example

💳 Pricing

⚠️ Limits and caveats

❓ FAQ

🌐 Does this crawl JavaScript-rendered websites?

🌍 Will it crawl external websites too?

📄 Can I crawl only one page?

🧯 Is this a broken link checker?

📝 Changelog

🆘 Support

🔗 Other actors

You might also like

Website URL Crawler & Link Extractor

Website Link Extractor — List All URLs from Any Page

Broken Link Checker & Scraper - 404 Audit API

Broken Link Checker

Video Download Link Crawler

Crawl4ai

Website Link Graph & Outbound Links Crawler

Broken Link Checker - Find Dead 404 Links

Payment Link API

Broken Link Checker — Recursive Site Crawler