VOOZH about

URL: https://apify.com/gratifying_graph/sitemap-diff

⇱ Sitemap Inventory & Diff - URL Extractor with Change Detection Β· Apify


πŸ‘ Sitemap Inventory & Diff - URL Extractor with Change Detection avatar

Sitemap Inventory & Diff - URL Extractor with Change Detection

Pricing

from $10.00 / 1,000 1,000 urls processeds

Go to Apify Store

Sitemap Inventory & Diff - URL Extractor with Change Detection

Extract every URL from a site's sitemaps, then diff against the previous run: pages added, removed, or updated since last check. Built for SEO monitoring, RAG freshness, and competitor watching.

Pricing

from $10.00 / 1,000 1,000 urls processeds

Rating

0.0

(0)

Developer

πŸ‘ Jimmy A

Jimmy A

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

8 days ago

Last modified

Share

Extract every URL from a website's sitemaps and find out what changed since your last check: pages added, pages removed, pages updated. One run gives you the full URL inventory. Scheduled runs give you a change feed for any site on the internet.

No browser, no proxies, no login. It reads the same sitemap.xml files sites publish for Google.

What it does

  1. Discovers sitemaps from robots.txt (falls back to /sitemap.xml and common index paths)
  2. Follows sitemap index files recursively, including gzipped (.xml.gz) sitemaps
  3. Extracts every URL with its lastmod date
  4. Saves a snapshot per domain, then on the next run reports added / removed / changed URLs
  5. Outputs a clean summary to the dataset; optionally the full URL inventory

Use cases

  • SEO monitoring: catch when a competitor publishes new landing pages, kills old ones, or refreshes content
  • RAG and AI pipelines: keep a vector index fresh by re-crawling only the URLs that changed instead of the whole site
  • Content watch: see when a publisher, government site, or documentation portal adds pages on a topic
  • Site audits: instant URL inventory for any domain, exportable as JSON or CSV
  • Index bloat checks: compare what a site publishes in sitemaps over time

Input

{
"domains":["competitor.com","docs.example.com"],
"computeDiff":true,
"outputInventory":false
}

You can also pass exact sitemap URLs via sitemapUrls if a site keeps them in a non-standard place.

Output

One summary item per domain:

{
"type":"summary",
"domain":"competitor.com",
"sitemapFiles":7,
"urlCount":3741,
"diff":{
"previousRunFound":true,
"added":12,
"removed":3,
"changed":41,
"addedUrls":["https://competitor.com/new-feature","..."],
"removedUrls":["..."],
"changedUrls":["..."]
}
}

Set outputInventory: true to also get one item per URL (url, lastmod, domain).

The first run for a domain saves the baseline snapshot; diffs start with the second run. Snapshots persist between runs, so a weekly schedule gives you a weekly change report.

Scheduling

Pair this actor with an Apify Schedule (for example weekly per domain). Each scheduled run compares against the previous snapshot automatically. Use the snapshotGroup input to track the same domain on two independent schedules without the snapshots interfering.

API / Standby mode for AI agents

The actor also runs as an HTTP endpoint (Standby). Agents and integrations can call:

GET /?domain=example.com&diff=true

and receive the summary JSON synchronously. Works as an MCP-style tool for agent frameworks that support Apify actors.

Pricing

Pay per event - you only pay for what the run actually does:

EventPrice
Actor start$0.0001
Per 1,000 URLs extracted$0.01
Diff computed (per domain)$0.02
API call (standby mode)$0.01

A weekly check of a 10,000-URL site costs about $0.12/month.

FAQ

How is this different from a sitemap URL extractor? Extractors give you the URL list. This actor also remembers the last run and tells you what changed: that is the part you actually want on a schedule.

Does it work on sites without robots.txt? Yes. It falls back to /sitemap.xml, /sitemap_index.xml, and /sitemap-index.xml, and you can pass exact sitemap URLs.

Does it handle huge sites? Yes. Sitemap indexes are followed recursively up to 500 sitemap files per domain, with a configurable URL cap (default 100,000).

Does it crawl pages? No. It only reads sitemap files, which makes it fast, cheap, and gentle on the target site. If a URL is not in the sitemaps, it will not appear.

Can I get the result as CSV? Yes, every Apify dataset exports as CSV, JSON, Excel, or via API.

You might also like

Sitemap Change Detector

tri_angle/sitemap-change-detector

Identify and monitor sitemaps for specified websites. Retrieve only the new, updated, or removed URLs since the last crawl.

πŸ‘ User avatar

Tri⟁angle

68

Website Change Monitor & Diff Tracker

ryanclinton/website-change-monitor

Monitor any website for content changes with automatic diff detection. Track pricing pages, competitor sites, ToS updates, and more. Compares snapshots, reports added/removed text, and supports CSS selector targeting for precise monitoring.

18

Sitemap Diff Tool

automation-lab/sitemap-diff-tool

Compare two XML sitemaps and find added, removed, or changed URLs. Detects lastmod, priority, and changefreq changes. Supports sitemap index files. Export results as JSON, CSV, or Excel.

πŸ‘ User avatar

Stas Persiianenko

2

SaaS Pricing & Change Tracker Scraper

taroyamada/saas-change-monitor-actor

SaaS pricing change tracker scraper. Browser-based crawl of competitor pricing and policy pages with precise text-diff extraction. Returns added/removed sections, currentHash, and per-URL change events for recurring competitor watch.

Sitemap Url Extractor

scrapers-hub/sitemap-url-extractor

Sitemap URL extractor to extract all URLs from XML sitemaps quickly and efficiently πŸŒπŸ“„ Ideal for SEO audits, site analysis, and indexing workflows. Fast, accurate, and easy to use.

JSON Diff Tool

automation-lab/json-diff-tool

Semantically compare two JSON objects or files. Outputs a structured diff with dot-notation paths for every added, removed, changed, and type-changed field. Supports nested objects, arrays, URL fetching, and ignore lists.

πŸ‘ User avatar

Stas Persiianenko

4