VOOZH about

URL: https://apify.com/tri_angle/sitemap-change-orchestrator

⇱ Sitemap Change Orchestrator Β· Apify


Pricing

Pay per usage

Go to Apify Store

Sitemap Change Orchestrator

Monitor website sitemaps for new, updated, or removed URLs. Integration with the Website Content Crawler (WCC) allows feeding only relevant URLs. This ensures your web crawls are efficient, targeted, and resource-optimized, keeping your datasets fresh for any application.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Tri⟁angle

Tri⟁angle

Maintained by Apify

Actor stats

6

Bookmarked

42

Total users

2

Monthly active users

6 months ago

Last modified

Share

Monitor sitemaps, detect changes, orchestrate content crawls, and merge results into a single dataset.

What is Sitemap Change Orchestrator?

This actor orchestrates running the Sitemap Change Detector to identify changed URLs in sitemaps and then triggers parallel Website Content Crawler runs to fetch page content. Finally, it merges and deduplicates all crawler outputs by URL into one unified dataset.

Key Features

  • Detect sitemap changes (NEW, UPDATED, REMOVED, SAME)
  • Orchestrate parallel crawl runs with configurable memory and timeout
  • Merge and dedupe Website Content Crawler results into a single output
  • Store and retrieve sitemap snapshots in a named key-value store

How it Works

  1. Run the Sitemap Change Detector with your settings
  2. Collect changed URLs and batch them into Website Content Crawler runs
  3. Trigger Website Content Crawler runs in parallel
  4. Merge and dedupe all crawler run datasets by URL

How to Use

  1. Open the Sitemap Change Orchestrator actor on the Apify Store.
  2. Configure memory, timeouts, and whether to skip crawling.
  3. Paste your Website Content Crawler JSON input.
  4. Set WCC batching options.
  5. Save and click Run.
  6. Review merged and deduplicated output in the default dataset.

Example Input

{
"addRemovedUrlsToKvs":false,
"addWccUrlsToScd":true,
"changeTypes":["NEW","UPDATED"],
"discoverSitemaps":true,
"skipWcc":false,
"snapshotKeyPrefix":"APIFY",
"wccInput":{
"startUrls":[
{
"url":"https://www.apify.com",
"method":"GET"
}
]
// ...
}
}

Output

  • Merged and deduplicated output from all Website Content Crawler runs in the default dataset
  • Additionally, sitemap snapshots and removed-URL lists are stored in a named key-value store under your prefix

FAQ

Can I export data using API?

Yes, you can access this actor using your own applications through the Apify API. Click on the API tab for code examples or check out the Apify API reference docs at https://docs.apify.com/api/v2 for full details.

Can I use Sitemap Change Orchestrator through an MCP Server?

This actor, like all Apify actors, works on the Apify MCP server. For more information and instructions, read the Apify MCP server integration guide at https://docs.apify.com/platform/integrations/mcp.

Can I integrate data from Sitemap Change Orchestrator with other apps?

Yes. Sitemap Change Orchestrator can be connected with almost any cloud service or web app. Read more about the possibilities on our integrations page at https://apify.com/integrations.

Is it legal to scrape data using Sitemap Change Orchestrator?

This actor only extracts publicly available data. It does not collect private user data. However, you should ensure your reason for scraping is legitimate. Consult legal counsel if unsure. For more on scraping legality and ethics, see:

Your feedback

We welcome feedback to improve this actor. If you encounter issues or have suggestions, please create an issue on the actor’s Issues tab.

You might also like

Sitemap Change Detector

tri_angle/sitemap-change-detector

Identify and monitor sitemaps for specified websites. Retrieve only the new, updated, or removed URLs since the last crawl.

πŸ‘ User avatar

Tri⟁angle

68

Updated Content Checker

tomas.gabik/updated-content-checker

Monitors sitemaps for new/updated content. Returns only URLs modified since a specified date for efficient incremental scraping.

πŸ‘ User avatar

TomΓ‘Ε‘ GabΓ­k

4

Website Content Crawler

rupom888/website-content-crawler

Sitemap Scraper

scrapers-hub/sitemap-scraper

Sitemap scraper to crawl and extract URLs, pages, and structure from website sitemaps πŸŒπŸ“Š Perfect for SEO analysis, website auditing, and data extraction. Fast, reliable, and scalable.

Incremental Web Crawler

flamboyant_leaf/IncrementalCrawler-v2

The Incremental Crawler efficiently fetches URLs of recently added or updated web pages on a target site, optimizing resources by focusing only on new content. Ideal for keeping up with the latest updates, it integrates seamlessly into workflows for content monitoring and analysis.

AI Website Content Crawler

ilborso/ai-website-content-crawler

A super fast website crawler for Agentic AI integration

πŸ‘ User avatar

Fabio Borsotti

6

5.0

Sitemap Scraper

scrapevanta/sitemap-scraper

Sitemap Scraper extracts URLs, page metadata, update dates, images, and structured sitemap data from XML sitemaps. Ideal for SEO audits, website analysis, content discovery, indexing validation, competitor research, and large-scale web data collection.

Website Content Crawler

ayeeyee/website-content-crawler

Full website crawling

πŸ‘ User avatar

Virtual Footprint LLC

2

Sitemap Scraper

scrapedrift/sitemap-scraper

Sitemap Scraper extracts URLs, pages, images, and structured data from XML sitemaps. Quickly discover website content, audit site structure, monitor updates, support SEO analysis, conduct competitor research, and gather valuable website data at scale.

Related articles

AI data collection (how to feed your LLM)
Read more
Web crawling vs. web scraping
Read more
How to train an AI chatbot using automated scraping
Read more