VOOZH about

URL: https://apify.com/andok/tts-reader

โ‡ฑ Article to Text Extractor (for TTS/LLMs) ยท Apify


๐Ÿ‘ Article to Text Extractor (for TTS/LLMs) avatar

Article to Text Extractor (for TTS/LLMs)

Pricing

from $1.00 / 1,000 article extracteds

Go to Apify Store

Article to Text Extractor (for TTS/LLMs)

Extract the core readable text of any article or blog post, stripping out boilerplate. Perfect for Text-to-Speech or AI summaries.

Pricing

from $1.00 / 1,000 article extracteds

Rating

0.0

(0)

Developer

๐Ÿ‘ Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

0

Monthly active users

3 months ago

Last modified

Share

Article Text Extractor for TTS & AI

Extract clean, readable article text from any web page, stripped of navigation, ads, and boilerplate. Feed the output directly into text-to-speech engines, summarization models, or LLM pipelines without wasting tokens on HTML noise. Bulk-process hundreds of URLs with parallel concurrency.

Features

  • Readability engine โ€” uses Mozilla Readability to isolate the main article content from page clutter
  • Plain text output โ€” returns clean text ready for TTS APIs like ElevenLabs or OpenAI TTS
  • Bulk processing โ€” extract articles from hundreds of URLs in a single run
  • Metadata extraction โ€” captures title, author byline, and excerpt alongside the article text
  • Redirect tracking โ€” follows HTTP redirects and records the final URL
  • Configurable concurrency โ€” process 1 to 50 URLs in parallel
  • Backwards compatible โ€” accepts both urls array and single url field

Input

FieldTypeRequiredDefaultDescription
urlsarrayNoโ€”List of webpage URLs to extract article text from
urlstringNoโ€”Single URL for backwards compatibility (use urls for bulk)
timeoutSecondsintegerNo15Maximum seconds to wait for each URL response
concurrencyintegerNo10Number of URLs to process in parallel (1-50)

Input Example

{
"urls":[
"https://crawlee.dev",
"https://blog.apify.com/what-is-web-scraping/"
],
"timeoutSeconds":15,
"concurrency":10
}

Output

Each URL produces one dataset item containing the extracted plain text and metadata.

Key output fields:

  • inputUrl (string) โ€” the original URL provided
  • finalUrl (string) โ€” the URL after following redirects
  • status (number) โ€” HTTP status code
  • pageTitle (string) โ€” extracted article title
  • byline (string) โ€” author name if available
  • excerpt (string) โ€” short summary of the article
  • textContent (string) โ€” the full article text, cleaned and ready for TTS or AI processing
  • error (string) โ€” error message if extraction failed, otherwise null
  • checkedAt (string) โ€” ISO 8601 timestamp of when the extraction was performed

Output Example

{
"inputUrl":"https://crawlee.dev",
"finalUrl":"https://crawlee.dev/",
"status":200,
"pageTitle":"Crawlee - Build reliable crawlers. Fast.",
"byline":null,
"excerpt":"Crawlee is a web scraping and browser automation library for Node.js.",
"textContent":"Crawlee\n\nBuild reliable crawlers. Fast.\n\nCrawlee is a web scraping and browser automation library that helps you build reliable crawlers...",
"error":null,
"checkedAt":"2025-01-15T10:30:00.000Z"
}

Pricing

EventCost
Article ExtractedPay-per-event (see actor pricing page)

The actor respects the per-run max charge limit. Processing stops automatically when the spending cap is reached.

Use Cases

  • Podcast generation โ€” turn blog posts and news articles into clean text payloads for TTS APIs
  • LLM summarization โ€” feed distraction-free article text into GPT, Claude, or other models
  • Content monitoring โ€” track article changes over time with clean text snapshots
  • Accessibility tools โ€” extract readable text for screen readers and assistive technology
  • Newsletter curation โ€” pull article text from multiple sources for digest generation

Related Actors

ActorWhat it adds
Web Page to Markdown Converter for LLMsMarkdown-formatted output with heading structure preserved
PDF to Text Converter for AI & RAGExtend text extraction to PDF documents
RSS Feed Parser & ReaderDiscover article URLs automatically from RSS feeds

You might also like

Smart Article & Blog Extractor

lightkong/universal-blog-scraper

Extract clean text, author, title, and reading time from any news, blog, or article webpage. Perfect for AI/LLM training and RAG systems.

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

๐Ÿ‘ User avatar

Tugelbay Konabayev

41

Text Scraper (Free)

karamelo/text-scraper-free

Website Text Extractor. Extract Text from Webpages and Feed Your LLMs

1.1K

4.1

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

๐Ÿ‘ User avatar

Akash Kumar Naik

21

Web Article Extractor โ€” Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter โ€” returns title, author, full body text, and publish date in structured JSON.

Google Search to Full Article Text โšก$4 per 1k

ohmydata/google-search-to-full-article

Turn Google search (SERP) queries into a dataset of deduplicated, clean full article text.

Microsoft Text to Speech

jupri/microsoft-tts

๐Ÿ’ซ Use Microsoft Edge TTS service to convert texts into speech