Article Content Extractor 📄

Pricing

from $2.99 / 1,000 results

Article Content Extractor 📄

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

👁 EasyApi

EasyApi

Maintained by Community

Actor stats

Bookmarked

129

Total users

Monthly active users

2 months ago

Last modified

Features ✨

Extract article content and metadata from any URL
Support batch processing of multiple URLs
Clean and structured JSON output
Built-in rate limiting to avoid overloading target sites
Robust error handling and validation
Fast and efficient processing

Output Data Structure 📊

The actor extracts the following information from each article:

Title
Description
Main content (both HTML and plain text)
Author
Publication date
Source domain
Featured image URL
Related links
Tags
Scraping timestamp

Use Cases 💡

Content aggregation and syndication
News monitoring and analysis
Research and data collection
Content migration
SEO analysis
Digital archiving

Limitations ⚠️

Respects robots.txt and implements polite scraping
2-second delay between requests to avoid overwhelming target servers
URLs must be valid and accessible
Content extraction quality depends on page structure

Tips for Best Results 💪

Provide valid, accessible URLs
Use for public content only
Consider target website's terms of service
Monitor execution logs for any issues

Need help or have questions? Feel free to reach out!

Input Example

A full explanation of an input example in JSON.

{
"urls":[
"https://cleartax.in/s/gst-hsn-lookup",
"https://www.fancode.com/pickleball/schedule"
]
}

Output sample

The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

[
{
"url":"https://www.fancode.com/pickleball/schedule",
"title":"Pickleball Schedule - Check International and Domestic matches on FanCode",
"description":"ABOUT FANCODEIndia's Premium Live Streaming, Live Scores &amp; Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years....",
"content":"<div><p><label>ABOUT FANCODE</label><label>India's Premium Live Streaming, Live Scores &amp; Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years. The FanCode app has been downloaded by more than 3+ crore users. It offers interactive live streaming of all major sporting events, premier cricket tournaments, women's cricket, live football, basketball, baseball, wrestling, badminton, and other major sports. It also offer real-time match highlights, match videos, cricket videos, India cricket highlights, highlights of today's match, highlights of yesterday's match, cricket data, statistics, cricket analysis, fantasy insights, cricket updates, breaking news from India cricket and world of sports. It also offers sports merchandise for all major sporting leagues and teams from across the world.</label></p></div>",
"author":"",
"publishedDate":"",
"source":"fancode.com",
"image":"https://www.fancode.com/skillup-uploads/fc-web/home-page-new-arc/hero-image/v1/hero-image-dweb-v4.png",
"links":[
"https://www.fancode.com/pickleball/schedule"
],
"tags":[],
"scrapedAt":"2025-02-05T07:19:26.119Z"
},
 ...
]

Related Actors

📄 URL Metadata Crawler - Extract comprehensive metadata from web pages including meta tags, favicons, and Open Graph tags.
🔍 Google News Scraper - Collect up to 5000 news articles with flexible search options and language support.
📚 arXiv Search Scraper - Extract comprehensive research paper data including titles, authors, and abstracts.
🔬 Nature Search Results Scraper - Extract research article data from Nature.com with detailed metadata.
📚 Medium Posts Search Scraper - Get detailed information about articles, authors, and engagement metrics from Medium.
📚 Substack Posts Scraper - Extract comprehensive post data including title, author, and publication details.
🔍 PubMed Search Scraper - Scrape research papers and academic articles with comprehensive metadata.
📄 WikiHow Article Scraper - Extract article titles, dates, views, and detailed step-by-step content.
🔍 Cointelegraph Search Scraper - Extract comprehensive article data including titles, authors, and publish dates.
📚 Medium User Posts Scraper - Extract detailed post data including engagement metrics and publication details.
🎯 Keyword Discovery Tool - Discover new keyword ideas and uncover valuable search insights.
🔍 Keyword Density Checker - Analyze webpage content to calculate keyword density and frequency.
🔍 AI-powered Search - Transform search queries into structured, AI-powered summaries with references.
📝 Text Summarization - Automatically generate concise summaries of documents while preserving original content.
🌐 Website Content to Markdown for LLM Training - Transform web content into clean, LLM-ready Markdown format.

👁 Web Article Extractor — Clean Reader Mode Text & Metadata avatar

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

👁 User avatar

Maged

👁 Smart Article Extractor avatar

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

👁 User avatar

Data Pilot

👁 Smart Article Extractor avatar

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!

👁 User avatar

ParseForge

👁 Google News Article Scraper avatar

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

👁 User avatar

WebScrap

👁 Web Article Content Extractor avatar

Web Article Content Extractor

vulnv/web-article-content-extractor

Extract clean, readable content from news articles, blog posts, and web pages. Batch process multiple URLs, download images, bypass bot protection with proxy support. Perfect for content curation, research, and data analysis.

👁 User avatar

VulnV

👁 AI Blog Dataset Creator avatar

AI Blog Dataset Creator

datapilot/ai-blog-dataset-creator

Smart Article Scraper Actor extracts structured article data from URLs using, and Newspaper3k. It collects title, author, publish date, tags, full content, language, and word count. Supports proxy usage, JavaScript-rendered pages, and outputs clean JSON datasets.

👁 User avatar

Data Pilot

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

👁 User avatar

jack su

👁 Web Page Metadata Extractor — Title, OG Tags, Author & More avatar

Web Page Metadata Extractor — Title, OG Tags, Author & More

maged120/get-metadata

Extract all metadata from any web page in one request — title, meta description, Open Graph tags, Twitter Card data, canonical URL, author, publish date, and more.

👁 User avatar

Maged

👁 Article Extraction API avatar

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

👁 User avatar

Tugelbay Konabayev

👁 Fast News Content Scraper avatar

Fast News Content Scraper

datapilot/fast-news-content-scraper

Fast News Content Scraper Actor collects news articles using Fast News RSS and . It extracts title, URL, publish date, author, description, and full article text. Supports multiple queries, anti-bot delays, and outputs structured JSON with source site and scrape timestamp.

👁 User avatar

Data Pilot

URL: https://apify.com/easyapi/article-content-extractor