VOOZH about

URL: https://apify.com/lightkong/universal-blog-scraper

โ‡ฑ Smart Article & Blog Extractor ยท Apify


Pricing

from $0.50 / 1,000 results

Go to Apify Store

Smart Article & Blog Extractor

Extract clean text, author, title, and reading time from any news, blog, or article webpage. Perfect for AI/LLM training and RAG systems.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Lightkong

Lightkong

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

2 months ago

Last modified

Share

๐Ÿง  Smart Article & Blog Extractor

The ultimate tool for LLMs, RAG pipelines, and Content Analyzers. Extract clean, ad-free text from any news site, blog, or article in seconds.

Why this Actor?

When you train AI models or build RAG (Retrieval-Augmented Generation) systems, you don't want menus, sidebars, cookie popups, or footer links ruining your dataset. You only want the Title, Author, and the actual Content.

This actor uses Mozilla's powerful Readability algorithm (the same engine that powers Firefox's Reader View) to automatically strip away all the junk and give you a beautifully clean text output.

Advantages:

  • Universal: Works on Medium, TechCrunch, WordPress blogs, Substack, CNN, NYTimes, and 99% of other article pages.
  • Ultra-Fast: Uses HTTP requests (CheerioCrawler), extracting articles in less than a second per page.
  • Cost-Effective: Because it doesn't open heavy browsers, your Apify Compute Unit (CU) costs are practically zero.

๐Ÿ’ฐ Pricing: Pay-Per-Result

We charge only $0.50 per 1,000 articles extracted.

๐Ÿ“ฅ Input Schema

FieldTypeDescription
startUrlsArrayA list of article or blog URLs you want to extract.
proxyConfigurationObjectStandard Apify proxy settings to bypass IP blocks.

๐Ÿ“ค Output Schema

For each URL, the actor will produce a clean JSON object.

{
"url":"https://techcrunch.com/2023/12/20/example-article/",
"title":"The Future of Artificial Intelligence",
"author":"Jane Doe",
"publishedTime":"2023-12-20T10:00:00Z",
"siteName":"TechCrunch",
"textContent":"Artificial intelligence has been evolving rapidly... (clean text continues)",
"readingTimeMins":4,
"scrapedAt":"2026-04-30T17:30:00.000Z"
}

Start extracting clean knowledge today!

You might also like

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

Blog Scraper

naive_zing/blog-scraper

Company Blog Scraper, Blog Post Scraper, Corporate Blog Crawler, Automatic Blog Discovery, Blog Content Extractor, Article Metadata Scraper, Multi-Domain Blog Scraper, Competitor Blog Analysis, Content Marketing Scraper, Blog Post Metadata Extraction, Company Announcements Scraper.

๐Ÿง  Smart Article Extractor

scrapio/smart-article-extractor