Pricing
$29.00/month + usage
Go to Apify Store
โจ WordPress Content Extractor
๐Easily scrape and export posts, pages, metadata, images, and comments from any WordPress site. โจ WordPress content to JSON, CSV, or TXT โ instantly.
Pricing
$29.00/month + usage
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
33
Total users
3
Monthly active users
3 days ago
Last modified
Categories
Share
A powerful Apify Actor designed to extract comprehensive content from WordPress websites. This actor automatically discovers and extracts posts, pages, metadata, media, and other WordPress-specific content using intelligent parsing and WordPress REST API integration.
๐ Features
Comprehensive Content Extraction
- Blog Posts - Extract all blog posts with full content, titles, and metadata
- Static Pages - Extract WordPress pages and custom post types
- Media Assets - Extract images, videos, and other media with alt text
- SEO Metadata - Extract meta descriptions, Open Graph tags, and Twitter cards
- Comments - Optional extraction of user comments and discussions
- Taxonomies - Extract categories, tags, and custom taxonomies
- Author Information - Extract post/page author details
- Publication Dates - Extract publication and modification timestamps
Smart Discovery
- Automatic URL Discovery - Finds posts and pages through navigation menus
- WordPress REST API Integration - Leverages
/wp-json/wp/v2/endpoints when available - Pagination Support - Automatically follows pagination links
- Category & Tag Pages - Discovers content through WordPress taxonomies
Advanced Configuration
- Selective Extraction - Choose what content types to extract
- Page Limits - Set maximum number of pages to process
- SSL Support - Handles sites with certificate issues
- Custom Headers - Uses realistic browser headers for better compatibility
๐ Extracted Data Structure
Each extracted page/post includes:
{"url":"https://example.com/post-title","title":"Post Title","content":"Full HTML content or text","excerpt":"Post excerpt/summary","metadata":{"description":"Meta description","keywords":"Meta keywords","ogTitle":"Open Graph title","ogDescription":"Open Graph description","ogImage":"Open Graph image URL","canonical":"Canonical URL"},"media":[{"src":"image-url.jpg","alt":"Image alt text","type":"image"}],"comments":[{"author":"Commenter Name","content":"Comment text","date":"Comment date"}],"publishedDate":"2024-01-01T00:00:00Z","author":"Post Author","categories":["Category 1","Category 2"],"tags":["tag1","tag2"],"type":"post"}
โ๏ธ Input Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
url | String | Required | WordPress website URL to extract from |
extractPosts | Boolean | true | Whether to extract blog posts |
extractPages | Boolean | true | Whether to extract static pages |
extractMedia | Boolean | true | Whether to extract media URLs |
extractMetadata | Boolean | true | Whether to extract SEO metadata |
maxPages | Integer | 0 | Maximum pages to extract (0 = no limit) |
includeComments | Boolean | false | Whether to extract comments |
๐ ๏ธ Technical Details
Built With
- Apify SDK - Core actor framework
- Axios - HTTP client with SSL support
- Cheerio - Fast HTML parsing and manipulation
- Node.js - Runtime environment
WordPress Compatibility
- All WordPress versions - Works with any WordPress site
- Custom themes - Adapts to different theme structures
- Gutenberg blocks - Supports modern WordPress block editor
- Custom post types - Extracts custom content types
- Multisite networks - Works with WordPress multisite installations
Performance Features
- Concurrent processing - Efficient parallel content extraction
- Respectful crawling - Built-in delays to avoid overwhelming servers
- Error handling - Robust error recovery and logging
- Memory efficient - Optimized for large-scale extraction
๐ Getting Started
Quick Start
- Deploy the Actor - Build and deploy on Apify Platform
- Configure Input - Set your WordPress website URL
- Run Extraction - Start the actor and monitor progress
- Download Results - Get extracted data in JSON, CSV, or other formats
Example Usage
// Input configuration{"url":"https://your-wordpress-site.com","extractPosts":true,"extractPages":true,"extractMedia":true,"extractMetadata":true,"maxPages":50,"includeComments":false}
๐ Use Cases
Content Migration
- Site Migration - Extract content for moving to new platforms
- Backup Creation - Create comprehensive content backups
- Platform Migration - Move from WordPress to other CMS platforms
Content Analysis
- SEO Audit - Analyze meta tags and content structure
- Content Inventory - Catalog all posts, pages, and media
- Performance Analysis - Analyze content patterns and structure
Data Integration
- API Development - Create APIs from WordPress content
- Analytics Integration - Feed content data to analytics platforms
- Content Syndication - Distribute content to multiple platforms
