Pricing
from $0.30 / 1,000 url extracteds
Go to Apify Store
Sitemap URL Extractor
Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.
Pricing
from $0.30 / 1,000 url extracteds
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
11 days ago
Last modified
Categories
Share
Pull every URL from any website's sitemap.xml β automatically walks nested sitemap indexes and returns a clean dataset with loc, lastmod, changefreq, and priority for each page.
What you get
- URL (
loc) for every page listed in the site's sitemap - Last modified date (
lastmod) β when each page was last updated - Change frequency (
changefreq) βalways,hourly,daily,weekly,monthly,yearly,never - Priority (
priority) β relative importance of each URL (0.0 - 1.0) - Source sitemap β which sitemap file the URL came from (useful when a site splits its sitemap by section)
- Auto-discovery β point at a homepage and the actor finds the sitemap via
robots.txtor/sitemap.xml - Gzipped sitemap support β handles
.xml.gzfiles transparently - Recursive sitemap index walking β follows nested
<sitemapindex>files up to 5 levels deep
Use cases
- SEO audits β pull a full URL inventory before running site-wide checks (broken links, missing meta tags, schema validation)
- Content migration β build a complete URL list when moving a site between platforms
- Crawl budget planning β see how many URLs a site exposes and how recently each was updated
- Competitor research β map out every page a competitor publishes
- Sitemap validation β verify that your published sitemap actually contains the pages you expect
- Bulk URL scraping pipelines β feed the output into another actor for screenshots, content extraction, or AI summarization
How to use
- Enter a Website or Sitemap URL β either a homepage like
https://www.example.com(the actor auto-discovers the sitemap) or a direct sitemap URL likehttps://www.example.com/sitemap.xml - Set Max Items β
0returns every URL in the entire sitemap tree - Choose whether to Follow Sitemap Index β on by default, so a single run pulls every URL from every child sitemap
- Run the actor β results land in the Dataset tab
- Export to JSON, CSV, Excel, or Google Sheets directly from the Apify console
Extract every URL on a site
{"websiteUrl":"https://www.apify.com","maxItems":0,"followSitemapIndex":true}
Extract only the top-level sitemap
{"websiteUrl":"https://www.apify.com/sitemap.xml","maxItems":0,"followSitemapIndex":false}
Output format
One dataset record per URL:
{"loc":"https://www.apify.com/store","lastmod":"2024-08-12","changefreq":"daily","priority":"0.8","sourceSitemap":"https://www.apify.com/sitemap.xml"}
Fields not present in the sitemap entry come back as null.
Parameters
| Field | Default | Description |
|---|---|---|
| Website or Sitemap URL | https://www.apify.com | Homepage URL (auto-discovered) or direct .xml / .xml.gz sitemap URL |
| Max Items | 0 | Maximum URLs to return per run. 0 = unlimited |
| Follow Sitemap Index | true | Recurse into child sitemaps when the top-level file is a sitemap index |
Notes
- Sitemap discovery first looks for
Sitemap:directives in/robots.txt, then falls back to/sitemap.xml - Nested sitemap indexes are walked breadth-first; the actor de-duplicates sitemap URLs so circular references are safe
- Recursion is capped at 5 levels deep and 1,000 total sitemaps as a safety net against runaway loops
- Each fetched sitemap has a 30-second timeout β slow or unreachable child sitemaps are logged and skipped, the run continues
- Gzip-compressed sitemaps (
*.xml.gz) are decompressed automatically
Related website & SEO actors
Part of a complete website & SEO toolkit β explore the rest of the suite:
- Website Contact Scraper β Emails, phones, and socials from any website
- Website Email Scraper β Crawl a site deep and extract all emails
- Website Tech Stack Detector β Detect CMS, frameworks, analytics, and DNS/MX
- SEO Meta Tag Auditor β Audit title, OG, Twitter cards, and schema
- Domain WHOIS & SSL Inspector β WHOIS, domain age, and live SSL details
