VOOZH about

URL: https://apify.com/boring_code/get-urls-from-link

⇱ Get URLs from link Β· Apify


Pricing

$2.95/month + usage

Go to Apify Store

Get URLs from link

Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.

Pricing

$2.95/month + usage

Rating

5.0

(2)

Developer

πŸ‘ Audrius L.

Audrius L.

Maintained by Community

Actor stats

10

Bookmarked

261

Total users

0

Monthly active users

4 months ago

Last modified

Categories

Share

This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.

Features

  • Extract URLs from XML sitemaps or webpages
  • Smart URL path matching:
    • Use '/tags/' to match exact path
    • Use '/product' to match paths starting with /product
    • Use 'product' to match URLs containing this text anywhere
  • Exclude specific file extensions (e.g., images)
  • Exclude URLs using the same smart path matching
  • Limit the number of processed URLs
  • Simple comma-separated syntax for filters

Input Configuration

FieldTypeDescription
linkStringURL to process (required)
urlPatternStringList of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL
maxUrlsIntegerMaximum number of URLs to process (0 for no limit). Good for testing purposes
excludeExtensionsStringList of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif
customExcludePatternStringList of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author'

Output

The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:

{
"url":"https://example.com/page"
}

Usage Examples

Basic Usage

Extract all URLs from a sitemap:

{
"link":"https://example.com/sitemap.xml"
}

Smart Path Matching

Get only product URLs with different matching options:

{
"link":"https://example.com/sitemap.xml",
"urlPattern":"/products/,productId,deals/"
}

This will match:

  • URLs containing exact '/products/' path
  • URLs containing 'productId' anywhere
  • URLs ending with 'deals/'

Exclude File Types and Sections

Get URLs excluding images and specific sections:

{
"link":"https://example.com/sitemap.xml",
"excludeExtensions":"jpg,jpeg,png,gif",
"customExcludePattern":"/tags/,/category/,author"
}

Limit Results

Get first 100 URLs for testing:

{
"link":"https://example.com/sitemap.xml",
"maxUrls":100
}

You might also like

Sitemap URL Finder

thescrapelab/sitemap-target-url-extractor

Find and export URLs from any website’s robots.txt and sitemaps. Enter a domain or website URL, optionally filter matching URLs by text, and get clean dataset rows with the URL, domain, path, source sitemap, and match details.

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Sitemap Generator

datawinder/sitemap-generator

Automatically crawl a website and generate an SEO-ready sitemap in XML, HTML, or TXT format. Supports crawl depth limits, URL include/exclude patterns, and optional merging with an existing sitemap.xml. Ideal for SEO audits, site migrations, and automation.

πŸ‘ User avatar

DatawinderLabs

2

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

πŸ‘ User avatar

Stas Persiianenko

16

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

πŸ‘ User avatar

Percival Villalva

271

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more β€” ready to export as CSV, JSON, or Excel.

XML Sitemap Checker

coder_luffy/xml-sitemap-checker

Verify if your website has a properly configured XML sitemap. Checks robots.txt and common paths, validates accessibility, XML structure, content type, and URL count β€” ensuring search engines can easily crawl and index your site.