π π§ͺHigh-Volume Website Content & Media Scraper avatar
π§ͺHigh-Volume Website Content & Media Scraper
Pricing
$4.50 / 1,000 results
Go to Apify Store
π§ͺHigh-Volume Website Content & Media Scraper
π§ͺCrawling Done Right! Let me now what you think, what or where or how i can improve my actor, and i am all for constructive criticism. So please message if you have any questions. Enjoy and have a good day.
Pricing
$4.50 / 1,000 results
Rating
5.0
(2)
Developer
Actor stats
6
Bookmarked
148
Total users
6
Monthly active users
18 days ago
Last modified
Categories
Share
ALL Social Media/WebScraper
Extract structured content from public social profile pages, article pages, landing pages, and other JavaScript-heavy websites. This actor focuses on turning a page into a clean record of text blocks, metadata, images, video references, and outgoing links.
What it does
- Opens each public URL in a browser session
- Extracts the page title and basic metadata
- Captures article-like text blocks from the page
- Collects image URLs, embedded video URLs, direct video source URLs, and outbound links
- Optionally filters Facebook links out of the outbound link list
- Stores diagnostic screenshots for failed pages
Good fit
- Public Instagram profile pages
- Blog articles and news pages
- Marketing sites and landing pages
- Content research and competitor monitoring
- Collecting media/link inventories from public pages
Not a good fit
- Logged-in or private content
- Full API-style social scraping for each platform
- Comments, followers, or hidden profile data
- Sites that require persistent authenticated sessions
Input example
{"startUrls":[{"url":"https://instagram.com/muddlemix_"},{"url":"https://example.com/blog/example-article"}],"includeFacebookLinks":true,"headless":true,"maxConcurrency":3,"requestHandlerTimeoutSecs":90,"navigationTimeoutSecs":90,"waitAfterLoadSecs":0.5,"saveErrorScreenshots":true}
Output fields
Each dataset item can include:
urltitlemetaarticlesimagesvideoslinksscrapedscrapeTimeprocessingTimeMscontentTypeerrordiagstatus
Notes
- The default dataset is the main output.
- Failed pages are still pushed into the dataset with
status,error, and optional diagnostic screenshot URL so runs stay debuggable. - This actor is best positioned as a public-page media and content extractor, not a full per-platform private-data scraper.
