VOOZH about

URL: https://apify.com/maximedupre/website-url-crawler

โ‡ฑ Website URL Crawler & Link Extractor ยท Apify


๐Ÿ‘ Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

Pricing

from $10.80 / 1,000 discovered website links

Go to Apify Store

Website URL Crawler & Link Extractor

Crawl JavaScript-rendered websites and export a URL link map. Get source pages, depth, anchor text, link type, HTTP metadata, and crawl status.

Pricing

from $10.80 / 1,000 discovered website links

Rating

0.0

(0)

Developer

๐Ÿ‘ Maxime Duprรฉ

Maxime Duprรฉ

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

20 days ago

Last modified

Share

๐Ÿ”— Website URL crawler for rendered pages

Website URL Crawler crawls JavaScript-rendered public websites and exports a clean link map. Add one or more website URLs or domains, and the Actor opens pages in a browser, reads the rendered links, follows the pages you allow, and saves one dataset item per discovered link.

Use it for SEO audits, website migrations, QA checks, broken-link investigation, internal linking reviews, and RAG source inventories. It works well when you need more than a raw list of URLs: each link keeps its source page, parent URL, depth, anchor text, link type, crawl status, and optional HTTP metadata.

For a quick first run, keep the prefilled IANA reserved domains page. It is small, public, and gives you a readable website link map without needing your own test site.

๐Ÿงญ What this Actor does

Website URL Crawler starts from your submitted websites and discovers links from rendered HTML pages. That means links added by common client-side JavaScript can be included in the crawl output after the page loads.

The Actor can crawl within the same host, within the same registrable domain, or emit external links as discovered-only rows. Only internal page links are followed further. Document and media links can be included or skipped depending on the asset setting you choose.

Each run is designed for link extraction and crawl mapping, not full content scraping. The output helps you answer practical questions such as:

  • Which pages does this website link to?
  • Where was each URL found?
  • What anchor text points to each link?
  • How deep is the link from the start page?
  • Is the link internal, external, a document, or an asset?
  • Which crawled pages returned HTTP status and content type metadata?

๐Ÿ“ฆ Data you get

Every saved item represents one crawled or discovered website link. Fields include:

  • startUrl - the original website URL this crawl started from
  • url and normalizedUrl - the discovered link and its normalized version
  • sourceUrl - the rendered page where the link was found
  • parentUrl - the page that led to a crawled URL, when available
  • depth - crawl depth from the start URL
  • anchorText - visible link text when present
  • linkType - page, document, asset, or external
  • crawlStatus - crawled or discovered
  • httpStatusCode, finalUrl, and contentType - when HTTP status checks are enabled and the page is crawled
  • isInternal, isExternal, isAsset, and isDuplicate - booleans for filtering and audits
  • rawHref, foundOnTitle, sourceIndex, and discoveredAt - source evidence and scrape metadata

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML, or consume it through the Apify API, schedules, webhooks, and integrations.

โš™๏ธ How to run it

  1. Add one or more website URLs or domains.
  2. Choose how many pages to crawl per website.
  3. Set the crawl depth and maximum links per page.
  4. Pick whether to stay on the same host, same domain, or include external links as discovered-only rows.
  5. Choose whether to include document links, all asset links, or pages only.
  6. Run the Actor and open the dataset.

Domains such as example.com are accepted and normalized to HTTPS. Full URLs such as https://example.com/docs are also accepted.

๐Ÿงพ Input options

Website URLs is the only required input. Add the sites you want to crawl.

Max pages per website controls how many HTML pages are opened for each start URL. Discovered links can still be emitted before the page cap is reached.

Max crawl depth controls how many levels of links the Actor follows from the start page. Use 0 when you only want links from the submitted page itself.

Max links per page limits how many rendered links are emitted and considered from each crawled page.

Crawl scope controls which internal links can be followed. External links are never crawled further; they can be emitted as discovered-only rows when your settings allow it.

Asset links controls whether the dataset includes only HTML page links, document links such as PDFs and spreadsheets, or all links including media assets.

Ignored extensions lets you skip common file types unless you choose to include all links.

Check HTTP status adds status code, final URL, and content type for crawled pages.

๐Ÿงช Output example

{
"startUrl":"https://www.iana.org/domains/reserved",
"url":"https://www.iana.org/domains/root",
"normalizedUrl":"https://www.iana.org/domains/root",
"sourceUrl":"https://www.iana.org/domains/reserved",
"parentUrl":"https://www.iana.org/domains/reserved",
"depth":1,
"anchorText":"Root Zone Management",
"linkType":"page",
"crawlStatus":"discovered",
"isInternal":true,
"isExternal":false,
"isAsset":false,
"isDuplicate":false,
"rawHref":"/domains/root",
"foundOnTitle":"IANA-managed Reserved Domains",
"sourceIndex":24,
"discoveredAt":"2026-05-26T00:00:00.000Z"
}

๐Ÿ’ณ Pricing

This Actor uses pay-per-event pricing. You are charged for each saved website link item. The pricing event is called Discovered website link.

Use a small Max pages per website value for your first run, then increase the limit once the output shape looks right.

โš ๏ธ Limits and caveats

Website URL Crawler is browser-rendered, so it is designed for capability over minimum runtime cost. Large sites can produce many links quickly; start with a small page limit and expand from there.

The Actor reads links from public rendered pages. It does not log in, submit forms, click through interactive menus, or guarantee that every route in a single-page app is discoverable from normal anchor links.

HTTP status, final URL, and content type are available for crawled pages. Links that are only discovered but not crawled are still useful for mapping, but they may not have those HTTP fields.

โ“ FAQ

๐ŸŒ Does this crawl JavaScript-rendered websites?

Yes. Pages are opened in a browser and links are extracted from the rendered page, not only the initial HTML response.

๐ŸŒ Will it crawl external websites too?

No. External links can be saved as discovered links, but the crawler only follows internal page links within the scope you choose.

๐Ÿ“„ Can I crawl only one page?

Yes. Set Max crawl depth to 0 when you want links from the submitted page without following deeper links.

๐Ÿงฏ Is this a broken link checker?

It can help with broken-link workflows by exporting discovered links and HTTP metadata for crawled pages, but the core output is a website URL crawl map.

๐Ÿ“ Changelog

  • 0.0: Initial release.

๐Ÿ†˜ Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h ๐Ÿซก

๐Ÿ”— Other actors

Made with โค๏ธ by Maxime Duprรฉ

You might also like

Website URL Crawler & Link Extractor

maged120/get-urls-pro

Crawl any website and extract all URLs with full hierarchy โ€” depth, parent URL, and anchor text. Supports static and JavaScript-rendered sites. Configurable depth and domain filtering.

Website Link Extractor โ€” List All URLs from Any Page

maged120/get-urls

Extract all links from any web page. Returns every URL found with anchor text and link type โ€” useful for quick link audits, competitor research, or sitemap building.

Broken Link Checker

taroyamada/broken-link-checker

Crawl supplied websites to find dead internal and outbound links with status codes, anchor context, redirect hints, and source pages.

Video Download Link Crawler

rodrigo91/video-download-link-crawler

Automatically discover and extract video download links from any website. Crawl through multiple pages, follow custom link patterns, and export results in JSON, CSV, HTML, or XML formats. Perfect for content creators, researchers, and media professionals.

๐Ÿ‘ User avatar

Rodrigo Franco

88

Crawl4ai

kael_odin/crawl4ai

Extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Website Link Graph & Outbound Links Crawler

logiover/website-link-graph-crawler

Extract all links from a website to CSV/JSON. Maps internal & outbound link graph with anchor text + nofollow/rel flags. No API, no login.

Broken Link Checker - Find Dead 404 Links

logiover/broken-link-checker

Site-wide broken link checker: crawl any website, find 404 and dead links, export the link audit to CSV or JSON with source page and status code.

Broken Link Checker โ€” Recursive Site Crawler

accurate_pouch/broken-link-checker

Recursively crawl your website and find every broken link, 404, redirect, and timeout. Checks internal and external links with configurable depth. 100 links free per run.

๐Ÿ‘ User avatar

Manchitt Sanan

3