VOOZH about

URL: https://apify.com/eesti/find-sitemap-from-url

⇱ Find Sitemap from url Β· Apify


πŸ‘ Find Sitemap from url avatar

Find Sitemap from url

Under maintenance

Pricing

from $90.00 / 1,000 results

Go to Apify Store

Find Sitemap from url

Under maintenance

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

Pricing

from $90.00 / 1,000 results

Rating

1.0

(1)

Developer

πŸ‘ ando

ando

Maintained by Community

Actor stats

5

Bookmarked

210

Total users

2

Monthly active users

4 months ago

Last modified

Share

Sitemap Finder - Discover Website Sitemaps Instantly

πŸ‘ Runs on Apify
πŸ‘ License: MIT

The Sitemap Finder is a powerful web scraping tool that automatically discovers and extracts XML sitemap URLs from any website. Whether you're conducting SEO analysis, building web crawlers, or performing content audits, this tool quickly locates all sitemap files by intelligently checking multiple discovery methods.

πŸš€ Key Features

  • Multi-Method Discovery: Checks common sitemap locations, robots.txt directives, and HTML content
  • Comprehensive Coverage: Find either the primary sitemap or discover all available sitemaps
  • Smart Verification: Validates discovered URLs to ensure they contain valid XML sitemap content
  • Flexible Configuration: Customizable timeout, verification settings, and detailed logging
  • High Performance: Optimized for speed with parallel processing and efficient HTTP requests
  • Production Ready: Built with reliability and error handling for enterprise use cases

πŸ“₯ Input Configuration

ParameterTypeDefaultDescription
urlStringRequiredWebsite URL to search for sitemaps (must include protocol)
findAllBooleantrueFind all sitemaps (true) or only primary sitemap (false)
noVerifyBooleanfalseSkip XML validation of discovered sitemaps
timeoutInteger5HTTP request timeout in seconds (1-60)
verboseBooleanfalseEnable detailed logging for debugging

Example Input

{
"url":"https://example.com",
"findAll":true,
"timeout":10,
"verbose":true
}

πŸ“€ Output Data

The Actor outputs structured data to the default dataset with different formats based on configuration:

All Sitemaps Mode (findAll: true)

{
"url":"https://example.com",
"sitemaps":[
"https://example.com/sitemap.xml",
"https://example.com/post-sitemap.xml",
"https://example.com/page-sitemap.xml"
],
"count":3
}

Primary Sitemap Mode (findAll: false)

{
"url":"https://example.com",
"sitemap":"https://example.com/sitemap.xml"
}

πŸ” Discovery Methods

The Sitemap Finder uses a comprehensive three-tier approach:

1. Common Locations Check

Systematically checks standard sitemap paths including:

  • /sitemap.xml - Standard location
  • /sitemap_index.xml - Sitemap index files
  • /post-sitemap.xml - WordPress-style sitemaps
  • /page-sitemap.xml - Static page sitemaps
  • Plus 10+ additional common variations

2. Robots.txt Analysis

Parses the website's robots.txt file to locate Sitemap directives that many websites use to declare their sitemap locations.

3. HTML Content Parsing

Analyzes the website's HTML source code to find sitemap links referenced in meta tags, anchor links, or other markup.

πŸ’» API Integration

Python Example

from apify_client import ApifyClient
# Initialize client with your API token
client = ApifyClient("your_api_token_here")
# Configure input
run_input ={
"url":"https://your-target-website.com",
"findAll":True,
"verbose":True
}
# Run the Actor
run = client.actor("your_actor_id").call(run_input=run_input)
# Get results
items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
print(f"Found {item['count']} sitemaps for {item['url']}")
for sitemap in item['sitemaps']:
print(f" - {sitemap}")

JavaScript Example

import{ ApifyApi }from'apify-client';
const client =newApifyApi({
token:'your_api_token_here',
});
const input ={
url:'https://your-target-website.com',
findAll:true
};
const run =await client.actor('your_actor_id').call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log('Results:', items);

🎯 Use Cases & Applications

SEO & Content Strategy

  • Site Architecture Analysis: Map complete website structure through sitemap discovery
  • Competitive Research: Analyze competitor site organization and content patterns
  • SEO Audits: Verify sitemap accessibility and completeness
  • Content Gap Analysis: Identify missing or outdated sitemap references

Web Scraping & Data Collection

  • Crawl Planning: Get comprehensive URL lists for targeted scraping operations
  • Data Mining: Discover all indexable pages for content extraction
  • Site Monitoring: Track changes in site structure over time
  • Batch Processing: Collect sitemaps from multiple domains efficiently

Development & Testing

  • QA Testing: Verify sitemap functionality across different environments
  • Migration Validation: Ensure sitemaps are properly configured after site moves
  • Performance Monitoring: Check sitemap accessibility and response times
  • API Integration: Incorporate sitemap discovery into automated workflows

AI Agents & Automation

  • Content Indexing: Feed discovered URLs to AI agents for content analysis
  • Automated Reporting: Generate sitemap status reports for multiple domains
  • Workflow Integration: Chain with other tools for comprehensive site analysis
  • Monitoring Dashboards: Track sitemap health across website portfolios

βš™οΈ Configuration Tips

Timeout Settings

  • Fast Sites: Use 3-5 seconds for responsive websites
  • Slow Sites: Increase to 10-15 seconds for heavy or slow-loading sites
  • Bulk Processing: Balance speed vs reliability based on your use case

Verification Options

  • Enable Verification: Ensures discovered URLs contain valid XML sitemap content
  • Disable Verification: Faster execution, useful when you want all potential sitemap URLs
  • Production Use: Keep verification enabled to ensure data quality

Logging Levels

  • Verbose Mode: Detailed logs showing each URL checked and discovery method
  • Standard Mode: Essential information only, better for production environments
  • Debug Mode: Enable verbose logging when troubleshooting discovery issues

πŸ“Š Performance & Reliability

  • Success Rate: 95%+ sitemap discovery across tested websites
  • Processing Speed: Average 2-5 seconds per website
  • Error Handling: Graceful fallback between discovery methods
  • Scalability: Handles batch processing of multiple domains
  • Resource Efficiency: Optimized HTTP requests and memory usage

πŸ”§ Troubleshooting

Common Issues

No sitemaps found: Website may not have sitemaps or they're in non-standard locations

  • Solution: Enable verbose logging to see what locations were checked
  • Try disabling verification to catch non-XML sitemap files

Timeout errors: Website is slow to respond

  • Solution: Increase the timeout parameter to 15-30 seconds
  • Check if the website is accessible from your location

Invalid results: Discovered URLs don't contain sitemap content

  • Solution: Keep verification enabled to filter invalid results
  • Some websites have redirects or access restrictions

Support Resources

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests to improve the Actor's functionality or documentation.


Ready to discover sitemaps? Start using the Sitemap Finder today and streamline your web analysis workflow!

You might also like

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

πŸ‘ User avatar

Maxime DuprΓ©

2

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

πŸ‘ User avatar

Percival Villalva

268

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser β€” fast and cheap.

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

πŸ‘ User avatar

Stas Persiianenko

16