VOOZH about

URL: https://apify.com/zerobreak/xml-sitemap-validator

โ‡ฑ Xml Sitemap Validator ยท Apify


Pricing

$4.99/month + usage

Go to Apify Store

Xml Sitemap Validator

XML sitemap validator that crawls every URL in your sitemap and flags broken links, redirect chains, and structural errors โ€” so SEO teams can audit sitemap health in seconds.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

๐Ÿ‘ ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 months ago

Last modified

Categories

Share

XML Sitemap Validator โ€” Find Broken Links, Redirects & Errors in Any Sitemap

XML Sitemap Validator is an Apify actor that fetches any XML sitemap, checks every listed URL for HTTP status codes, and produces a detailed per-URL report โ€” just like the validation tools at xml-sitemaps.com or seoptimer.com's sitemap checker, but fully automated and exportable. Point it at a sitemap URL and get back a structured dataset showing which pages are accessible, which are broken (404), which redirect, and how fast each one loads.

Whether you're running an SEO audit on a large e-commerce site, validating a sitemap before a site migration, or monitoring URL health on a weekly schedule, this actor handles it all โ€” including sitemap index files with nested child sitemaps.

Use Cases

  • SEO auditing โ€” Automatically detect broken links and redirect chains that harm your search rankings before Google finds them first
  • Pre-launch validation โ€” Crawl your sitemap after a redesign or CMS migration to confirm every URL returns 200 OK
  • Sitemap index support โ€” Validate large sites like BBC or Shopify that split their sitemaps across dozens of child sitemap files
  • Response time monitoring โ€” Flag slow-loading pages (high responseTimeMs) that may affect Core Web Vitals
  • Redirect chain detection โ€” Identify URLs in your sitemap that still point to old addresses that have since been permanently moved
  • Scheduled health checks โ€” Run on a cron trigger and pipe results to Google Sheets or Slack to monitor sitemap health over time

Input

ParameterTypeDefaultDescription
sitemapUrlstringโ€”Required. URL of the XML sitemap to validate. Supports standard sitemaps and sitemap index files.
sitemapUrlsarray[]Additional sitemap URLs to validate in the same run.
checkUrlsbooleantrueFetch each listed URL to verify HTTP status. Disable to validate XML structure only.
followRedirectsbooleantrueFollow HTTP redirects and record the final destination URL.
concurrencyinteger10Number of URLs to check in parallel. Higher values are faster but may trigger rate limiting.
maxUrlsinteger100Maximum number of URLs to process per run. Set to 0 for no limit.
timeoutSecsinteger300Total actor runtime limit in seconds.
requestTimeoutSecsinteger30Per-URL request timeout in seconds. URLs exceeding this are flagged as timeouts.

Example Input โ€” Validate a Single Sitemap

{
"sitemapUrl":"https://www.shopify.com/sitemap.xml",
"checkUrls":true,
"concurrency":15,
"maxUrls":200,
"requestTimeoutSecs":20
}

Example Input โ€” Validate Multiple Sitemaps at Once

{
"sitemapUrl":"https://www.bbc.com/sitemap.xml",
"sitemapUrls":[
"https://techcrunch.com/news-sitemap.xml",
"https://www.smashingmagazine.com/sitemap_index.xml"
],
"checkUrls":true,
"followRedirects":true,
"concurrency":10,
"maxUrls":500
}

Example Input โ€” Structure-Only Validation (No URL Requests)

{
"sitemapUrl":"https://www.theverge.com/sitemap.xml",
"checkUrls":false
}

What Data Does This Actor Extract?

The actor stores one record per URL found in the sitemap. Each entry contains:

{
"sitemapUrl":"https://www.shopify.com/sitemap.xml",
"url":"https://www.shopify.com/blog/what-is-shopify",
"lastmod":"2024-11-15",
"changefreq":"weekly",
"priority":0.8,
"httpStatus":200,
"isAccessible":true,
"finalUrl":"https://www.shopify.com/blog/what-is-shopify",
"isRedirected":false,
"responseTimeMs":312,
"isValidUrl":true,
"issue":"",
"checkedAt":"2025-03-01T10:22:05.412Z"
}

Example โ€” Broken Link Detected

{
"sitemapUrl":"https://www.shopify.com/sitemap.xml",
"url":"https://www.shopify.com/blog/old-post-removed",
"lastmod":"2022-06-01",
"changefreq":"monthly",
"priority":0.5,
"httpStatus":404,
"isAccessible":false,
"finalUrl":"https://www.shopify.com/blog/old-post-removed",
"isRedirected":false,
"responseTimeMs":198,
"isValidUrl":true,
"issue":"Broken link โ€” page returned 404 Not Found",
"checkedAt":"2025-03-01T10:22:11.093Z"
}

Example โ€” Redirect Detected

{
"sitemapUrl":"https://www.bbc.com/sitemap.xml",
"url":"http://www.bbc.com/news/technology",
"lastmod":"2024-12-01",
"changefreq":"hourly",
"priority":0.9,
"httpStatus":301,
"isAccessible":false,
"finalUrl":"https://www.bbc.com/news/technology",
"isRedirected":true,
"responseTimeMs":145,
"isValidUrl":true,
"issue":"Redirect โ€” HTTP 301 to a different URL",
"checkedAt":"2025-03-01T10:22:08.774Z"
}
FieldTypeDescription
sitemapUrlstringSource sitemap the URL was found in
urlstringPage URL as declared in the sitemap
lastmodstringLast modified date from the sitemap
changefreqstringCrawl frequency hint (daily, weekly, monthly, etc.)
prioritynumberSitemap priority value between 0.0 and 1.0
httpStatusintegerHTTP status code returned (200, 301, 404, 500, etc.)
isAccessiblebooleantrue if the URL returned a 2xx response
finalUrlstringDestination URL after following redirects
isRedirectedbooleantrue if the request was redirected
responseTimeMsintegerServer response time in milliseconds
isValidUrlbooleantrue if the URL is a well-formed absolute URL
issuestringHuman-readable description of any detected problem
checkedAtstringISO 8601 timestamp of when the URL was checked

How It Works

  1. Fetch the sitemap โ€” The actor downloads the XML sitemap from the provided URL, handling both <urlset> (standard sitemap) and <sitemapindex> (index file with child sitemaps) formats
  2. Parse all URLs โ€” Every <loc> entry is extracted along with optional metadata: <lastmod>, <changefreq>, and <priority>
  3. Recursively expand sitemap indexes โ€” If the root sitemap is an index file, child sitemaps are fetched and parsed up to 3 levels deep, as seen on large sites like BBC and Shopify
  4. Validate URLs โ€” Each URL is checked for correct format (absolute http/https URL)
  5. Check HTTP status โ€” When checkUrls is enabled, the actor sends a HEAD request (falling back to GET for servers that reject HEAD) to each URL and records the status code, final URL, and response time
  6. Report issues โ€” Broken links (404), server errors (5xx), timeouts, redirects, and malformed URLs are flagged with a plain-English issue description
  7. Push results โ€” Each URL is stored as a separate dataset row for easy filtering, sorting, and export

Integrations

Connect XML Sitemap Validator with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.

For example, run the actor on a weekly schedule, pipe broken links directly into a Google Sheet, and send a Slack notification whenever new 404 errors are found โ€” fully automated sitemap monitoring without writing a single line of glue code.

FAQ

Can this actor validate sitemap index files (nested sitemaps)? Yes. If the provided sitemap URL points to a <sitemapindex> document, the actor automatically fetches and validates all child sitemaps listed inside it โ€” up to 3 levels deep. This covers large sites like BBC, Shopify, and TechCrunch that split their sitemaps across many files.

What is the difference between isAccessible and httpStatus? isAccessible is a boolean convenience field โ€” it is true only when httpStatus is in the 200โ€“299 range. httpStatus gives you the exact HTTP code so you can distinguish between a 301 permanent redirect and a 302 temporary redirect, or a 404 Not Found and a 410 Gone.

How many URLs can the actor check in a single run? The maxUrls input caps the number of URLs processed per run (default 100, maximum 10,000). For very large sitemaps, increase maxUrls and consider raising the timeoutSecs to give the actor enough time to complete.

Why does the actor use HEAD requests instead of GET requests? HEAD requests are faster and cheaper โ€” they retrieve HTTP headers (including status code and redirect location) without downloading the full page body. The actor automatically falls back to GET if a server returns 405 Method Not Allowed for HEAD, which some servers do.

Can I use this for sitemap validation before a website migration? Absolutely. Run the actor against your current sitemap before migration, export the results to CSV or Google Sheets, then run it again after migration and compare to ensure all URLs still return 200 OK and no new broken links were introduced.

You might also like

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

570

5.0

Sitemap URL Extractor

getdataforu/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

2

5.0

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

๐Ÿ‘ User avatar

Percival Villalva

268

Sitemap to URL Crawler โ€” Extract Sitemap.xml URLs for RAG

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser โ€” fast and cheap.

Sitemap Generator

himalyancoder/Sitemap-generator