VOOZH about

URL: https://apify.com/easyapi/article-content-extractor

โ‡ฑ Article Content Extractor ๐Ÿ“„ ยท Apify


๐Ÿ‘ Article Content Extractor ๐Ÿ“„ avatar

Article Content Extractor ๐Ÿ“„

Pricing

from $2.99 / 1,000 results

Go to Apify Store

Article Content Extractor ๐Ÿ“„

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. ๐Ÿ”๐Ÿ“„

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ EasyApi

EasyApi

Maintained by Community

Actor stats

3

Bookmarked

129

Total users

10

Monthly active users

2 months ago

Last modified

Share

Extract clean article content and metadata from any web pages automatically. This actor helps you get structured content from news sites, blogs, and other article-based websites.

Features โœจ

  • Extract article content and metadata from any URL
  • Support batch processing of multiple URLs
  • Clean and structured JSON output
  • Built-in rate limiting to avoid overloading target sites
  • Robust error handling and validation
  • Fast and efficient processing

Output Data Structure ๐Ÿ“Š

The actor extracts the following information from each article:

  • Title
  • Description
  • Main content (both HTML and plain text)
  • Author
  • Publication date
  • Source domain
  • Featured image URL
  • Related links
  • Tags
  • Scraping timestamp

Use Cases ๐Ÿ’ก

  • Content aggregation and syndication
  • News monitoring and analysis
  • Research and data collection
  • Content migration
  • SEO analysis
  • Digital archiving

Limitations โš ๏ธ

  • Respects robots.txt and implements polite scraping
  • 2-second delay between requests to avoid overwhelming target servers
  • URLs must be valid and accessible
  • Content extraction quality depends on page structure

Tips for Best Results ๐Ÿ’ช

  1. Provide valid, accessible URLs
  2. Use for public content only
  3. Consider target website's terms of service
  4. Monitor execution logs for any issues

Need help or have questions? Feel free to reach out!

Input Example

A full explanation of an input example in JSON.

{
"urls":[
"https://cleartax.in/s/gst-hsn-lookup",
"https://www.fancode.com/pickleball/schedule"
]
}

Output sample

The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

[
{
"url":"https://www.fancode.com/pickleball/schedule",
"title":"Pickleball Schedule - Check International and Domestic matches on FanCode",
"description":"ABOUT FANCODEIndia's Premium Live Streaming, Live Scores & Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years....",
"content":"<div><p><label>ABOUT FANCODE</label><label>India's Premium Live Streaming, Live Scores &amp; Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years. The FanCode app has been downloaded by more than 3+ crore users. It offers interactive live streaming of all major sporting events, premier cricket tournaments, women's cricket, live football, basketball, baseball, wrestling, badminton, and other major sports. It also offer real-time match highlights, match videos, cricket videos, India cricket highlights, highlights of today's match, highlights of yesterday's match, cricket data, statistics, cricket analysis, fantasy insights, cricket updates, breaking news from India cricket and world of sports. It also offers sports merchandise for all major sporting leagues and teams from across the world.</label></p></div>",
"author":"",
"publishedDate":"",
"source":"fancode.com",
"image":"https://www.fancode.com/skillup-uploads/fc-web/home-page-new-arc/hero-image/v1/hero-image-dweb-v4.png",
"links":[
"https://www.fancode.com/pickleball/schedule"
],
"tags":[],
"scrapedAt":"2025-02-05T07:19:26.119Z"
},
...
]

Related Actors

You might also like

Web Article Extractor โ€” Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter โ€” returns title, author, full body text, and publish date in structured JSON.

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

Web Article Content Extractor

vulnv/web-article-content-extractor

Extract clean, readable content from news articles, blog posts, and web pages. Batch process multiple URLs, download images, bypass bot protection with proxy support. Perfect for content curation, research, and data analysis.

AI Blog Dataset Creator

datapilot/ai-blog-dataset-creator

Smart Article Scraper Actor extracts structured article data from URLs using, and Newspaper3k. It collects title, author, publish date, tags, full content, language, and word count. Supports proxy usage, JavaScript-rendered pages, and outputs clean JSON datasets.

Web Page Metadata Extractor โ€” Title, OG Tags, Author & More

maged120/get-metadata

Extract all metadata from any web page in one request โ€” title, meta description, Open Graph tags, Twitter Card data, canonical URL, author, publish date, and more.

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

๐Ÿ‘ User avatar

Tugelbay Konabayev

41

Fast News Content Scraper

datapilot/fast-news-content-scraper

Fast News Content Scraper Actor collects news articles using Fast News RSS and . It extracts title, URL, publish date, author, description, and full article text. Supports multiple queries, anti-bot delays, and outputs structured JSON with source site and scrape timestamp.