👁 Article to Text Extractor (for TTS/LLMs) avatar

Article to Text Extractor (for TTS/LLMs)

Pricing

from $1.00 / 1,000 article extracteds

👁 Article to Text Extractor (for TTS/LLMs)

Article to Text Extractor (for TTS/LLMs)

Extract the core readable text of any article or blog post, stripping out boilerplate. Perfect for Text-to-Speech or AI summaries.

Pricing

from $1.00 / 1,000 article extracteds

Rating

0.0

(0)

Developer

👁 Andok

Andok

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Article Text Extractor for TTS & AI

Extract clean, readable article text from any web page, stripped of navigation, ads, and boilerplate. Feed the output directly into text-to-speech engines, summarization models, or LLM pipelines without wasting tokens on HTML noise. Bulk-process hundreds of URLs with parallel concurrency.

Features

Readability engine — uses Mozilla Readability to isolate the main article content from page clutter
Plain text output — returns clean text ready for TTS APIs like ElevenLabs or OpenAI TTS
Bulk processing — extract articles from hundreds of URLs in a single run
Metadata extraction — captures title, author byline, and excerpt alongside the article text
Redirect tracking — follows HTTP redirects and records the final URL
Configurable concurrency — process 1 to 50 URLs in parallel
Backwards compatible — accepts both urls array and single url field

Input

Field	Type	Required	Default	Description
`urls`	`array`	No	—	List of webpage URLs to extract article text from
`url`	`string`	No	—	Single URL for backwards compatibility (use `urls` for bulk)
`timeoutSeconds`	`integer`	No	`15`	Maximum seconds to wait for each URL response
`concurrency`	`integer`	No	`10`	Number of URLs to process in parallel (1-50)

Input Example

{
"urls":[
"https://crawlee.dev",
"https://blog.apify.com/what-is-web-scraping/"
],
"timeoutSeconds":15,
"concurrency":10
}

Output

Each URL produces one dataset item containing the extracted plain text and metadata.

Key output fields:

inputUrl (string) — the original URL provided
finalUrl (string) — the URL after following redirects
status (number) — HTTP status code
pageTitle (string) — extracted article title
byline (string) — author name if available
excerpt (string) — short summary of the article
textContent (string) — the full article text, cleaned and ready for TTS or AI processing
error (string) — error message if extraction failed, otherwise null
checkedAt (string) — ISO 8601 timestamp of when the extraction was performed

Output Example

{
"inputUrl":"https://crawlee.dev",
"finalUrl":"https://crawlee.dev/",
"status":200,
"pageTitle":"Crawlee - Build reliable crawlers. Fast.",
"byline":null,
"excerpt":"Crawlee is a web scraping and browser automation library for Node.js.",
"textContent":"Crawlee\n\nBuild reliable crawlers. Fast.\n\nCrawlee is a web scraping and browser automation library that helps you build reliable crawlers...",
"error":null,
"checkedAt":"2025-01-15T10:30:00.000Z"
}

Pricing

Event	Cost
Article Extracted	Pay-per-event (see actor pricing page)

The actor respects the per-run max charge limit. Processing stops automatically when the spending cap is reached.

Use Cases

Podcast generation — turn blog posts and news articles into clean text payloads for TTS APIs
LLM summarization — feed distraction-free article text into GPT, Claude, or other models
Content monitoring — track article changes over time with clean text snapshots
Accessibility tools — extract readable text for screen readers and assistive technology
Newsletter curation — pull article text from multiple sources for digest generation

Related Actors

Actor	What it adds
Web Page to Markdown Converter for LLMs	Markdown-formatted output with heading structure preserved
PDF to Text Converter for AI & RAG	Extend text extraction to PDF documents
RSS Feed Parser & Reader	Discover article URLs automatically from RSS feeds

👁 Smart Article & Blog Extractor avatar

Smart Article & Blog Extractor

lightkong/universal-blog-scraper

Extract clean text, author, title, and reading time from any news, blog, or article webpage. Perfect for AI/LLM training and RAG systems.

👁 User avatar

Lightkong

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

👁 User avatar

jack su

👁 Article Extraction API avatar

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

👁 User avatar

Tugelbay Konabayev

👁 Text Scraper (Free) avatar

Text Scraper (Free)

karamelo/text-scraper-free

Website Text Extractor. Extract Text from Webpages and Feed Your LLMs

👁 User avatar

karamelo

1.1K

4.1

👁 Google Free Text to Speech avatar

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

👁 User avatar

cat

292

👁 Text to speech generator avatar

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

👁 User avatar

Akash Kumar Naik

👁 Web Article Extractor — Clean Reader Mode Text & Metadata avatar

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

👁 User avatar

Maged

Speech To Text

vivid_astronaut/speech-to-text

Convert speech to text with high accuracy using Azure AI. Supports 100+ languages, speaker detection, and timestamps. Perfect for transcription, subtitles, and voice-to-text applications.

👁 User avatar

Fabio Suizu

👁 Google Search to Full Article Text ⚡$4 per 1k avatar

Google Search to Full Article Text ⚡$4 per 1k

ohmydata/google-search-to-full-article

Turn Google search (SERP) queries into a dataset of deduplicated, clean full article text.

👁 User avatar

OhMyData

5.0

👁 Microsoft Text to Speech avatar

Microsoft Text to Speech

jupri/microsoft-tts

💫 Use Microsoft Edge TTS service to convert texts into speech

👁 User avatar

cat

URL: https://apify.com/andok/tts-reader