Find Sitemap from url

Under maintenance

Pricing

from $90.00 / 1,000 results

Try for free

Go to Apify Store

👁 Find Sitemap from url

Find Sitemap from url

Under maintenance

Try for free

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

Pricing

from $90.00 / 1,000 results

Rating

1.0

(1)

Developer

👁 ando

ando

Maintained by Community

Actor stats

Bookmarked

210

Total users

Monthly active users

4 months ago

Last modified

Sitemap Finder - Discover Website Sitemaps Instantly

👁 Runs on Apify
👁 License: MIT

The Sitemap Finder is a powerful web scraping tool that automatically discovers and extracts XML sitemap URLs from any website. Whether you're conducting SEO analysis, building web crawlers, or performing content audits, this tool quickly locates all sitemap files by intelligently checking multiple discovery methods.

🚀 Key Features

Multi-Method Discovery: Checks common sitemap locations, robots.txt directives, and HTML content
Comprehensive Coverage: Find either the primary sitemap or discover all available sitemaps
Smart Verification: Validates discovered URLs to ensure they contain valid XML sitemap content
Flexible Configuration: Customizable timeout, verification settings, and detailed logging
High Performance: Optimized for speed with parallel processing and efficient HTTP requests
Production Ready: Built with reliability and error handling for enterprise use cases

📥 Input Configuration

Parameter	Type	Default	Description
`url`	String	Required	Website URL to search for sitemaps (must include protocol)
`findAll`	Boolean	`true`	Find all sitemaps (true) or only primary sitemap (false)
`noVerify`	Boolean	`false`	Skip XML validation of discovered sitemaps
`timeout`	Integer	`5`	HTTP request timeout in seconds (1-60)
`verbose`	Boolean	`false`	Enable detailed logging for debugging

Example Input

{
"url":"https://example.com",
"findAll":true,
"timeout":10,
"verbose":true
}

📤 Output Data

The Actor outputs structured data to the default dataset with different formats based on configuration:

All Sitemaps Mode (`findAll: true`)

{
"url":"https://example.com",
"sitemaps":[
"https://example.com/sitemap.xml",
"https://example.com/post-sitemap.xml",
"https://example.com/page-sitemap.xml"
],
"count":3
}

Primary Sitemap Mode (`findAll: false`)

{
"url":"https://example.com",
"sitemap":"https://example.com/sitemap.xml"
}

🔍 Discovery Methods

The Sitemap Finder uses a comprehensive three-tier approach:

1. Common Locations Check

Systematically checks standard sitemap paths including:

/sitemap.xml - Standard location
/sitemap_index.xml - Sitemap index files
/post-sitemap.xml - WordPress-style sitemaps
/page-sitemap.xml - Static page sitemaps
Plus 10+ additional common variations

2. Robots.txt Analysis

Parses the website's robots.txt file to locate Sitemap directives that many websites use to declare their sitemap locations.

3. HTML Content Parsing

Analyzes the website's HTML source code to find sitemap links referenced in meta tags, anchor links, or other markup.

💻 API Integration

Python Example

from apify_client import ApifyClient
# Initialize client with your API token
client = ApifyClient("your_api_token_here")
# Configure input
run_input ={
"url":"https://your-target-website.com",
"findAll":True,
"verbose":True
}
# Run the Actor
run = client.actor("your_actor_id").call(run_input=run_input)
# Get results
items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
print(f"Found {item['count']} sitemaps for {item['url']}")
for sitemap in item['sitemaps']:
print(f" - {sitemap}")

JavaScript Example

import{ ApifyApi }from'apify-client';
const client =newApifyApi({
token:'your_api_token_here',
});
const input ={
url:'https://your-target-website.com',
findAll:true
};
const run =await client.actor('your_actor_id').call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log('Results:', items);

🎯 Use Cases & Applications

SEO & Content Strategy

Site Architecture Analysis: Map complete website structure through sitemap discovery
Competitive Research: Analyze competitor site organization and content patterns
SEO Audits: Verify sitemap accessibility and completeness
Content Gap Analysis: Identify missing or outdated sitemap references

Web Scraping & Data Collection

Crawl Planning: Get comprehensive URL lists for targeted scraping operations
Data Mining: Discover all indexable pages for content extraction
Site Monitoring: Track changes in site structure over time
Batch Processing: Collect sitemaps from multiple domains efficiently

Development & Testing

QA Testing: Verify sitemap functionality across different environments
Migration Validation: Ensure sitemaps are properly configured after site moves
Performance Monitoring: Check sitemap accessibility and response times
API Integration: Incorporate sitemap discovery into automated workflows

AI Agents & Automation

Content Indexing: Feed discovered URLs to AI agents for content analysis
Automated Reporting: Generate sitemap status reports for multiple domains
Workflow Integration: Chain with other tools for comprehensive site analysis
Monitoring Dashboards: Track sitemap health across website portfolios

⚙️ Configuration Tips

Timeout Settings

Fast Sites: Use 3-5 seconds for responsive websites
Slow Sites: Increase to 10-15 seconds for heavy or slow-loading sites
Bulk Processing: Balance speed vs reliability based on your use case

Verification Options

Enable Verification: Ensures discovered URLs contain valid XML sitemap content
Disable Verification: Faster execution, useful when you want all potential sitemap URLs
Production Use: Keep verification enabled to ensure data quality

Logging Levels

Verbose Mode: Detailed logs showing each URL checked and discovery method
Standard Mode: Essential information only, better for production environments
Debug Mode: Enable verbose logging when troubleshooting discovery issues

📊 Performance & Reliability

Success Rate: 95%+ sitemap discovery across tested websites
Processing Speed: Average 2-5 seconds per website
Error Handling: Graceful fallback between discovery methods
Scalability: Handles batch processing of multiple domains
Resource Efficiency: Optimized HTTP requests and memory usage

🔧 Troubleshooting

Common Issues

No sitemaps found: Website may not have sitemaps or they're in non-standard locations

Solution: Enable verbose logging to see what locations were checked
Try disabling verification to catch non-XML sitemap files

Timeout errors: Website is slow to respond

Solution: Increase the timeout parameter to 15-30 seconds
Check if the website is accessible from your location

Invalid results: Discovered URLs don't contain sitemap content

Solution: Keep verification enabled to filter invalid results
Some websites have redirects or access restrictions

Support Resources

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests to improve the Actor's functionality or documentation.

Ready to discover sitemaps? Start using the Sitemap Finder today and streamline your web analysis workflow!

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

👁 User avatar

Alex Jordan

👁 Sitemap Sniffer avatar

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

👁 User avatar

Crawler Bros

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

👁 User avatar

vøiddo

Website Sitemap Extractor

glassventures/website-sitemap-extractor

Extract all URLs from any website's sitemap. Auto-discovers sitemaps from robots.txt, supports sitemap index files and .gz compression. Filter by URL pattern, date range.

👁 User avatar

Glass Ventures

👁 Sitemap Sniffer avatar

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

👁 User avatar

Maxime Dupré

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

👁 User avatar

Tatsuya Mizuno

Sitemap API

vivid_astronaut/sitemap

👁 User avatar

Fabio Suizu

👁 Sitemap Scraper avatar

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

👁 User avatar

Percival Villalva

268

👁 Sitemap URL Extractor - List All URLs in a Sitemap avatar

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

👁 User avatar

Walid

👁 Robots.txt & Sitemap Analyzer avatar

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

👁 User avatar

Stas Persiianenko

URL: https://apify.com/eesti/find-sitemap-from-url