VOOZH about

URL: https://apify.com/nexgendata/news-announcements-rag-markdown?fpr=2ayu9b

⇱ News & Announcements to Markdown for RAG β€” LLM Datasets Β· Apify


πŸ‘ News & Announcements to Markdown for RAG avatar

News & Announcements to Markdown for RAG

Pricing

from $40.00 / 1,000 markdown chunks

Go to Apify Store

News & Announcements to Markdown for RAG

Convert press releases, corporate announcements & news articles into clean, chunked Markdown for RAG and LLM pipelines. Article URLs or RSS feeds. No login.

Pricing

from $40.00 / 1,000 markdown chunks

Rating

0.0

(0)

Developer

πŸ‘ NexGenData

NexGenData

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 days ago

Last modified

Categories

Share

πŸ“° News & Announcements to Markdown for RAG

Turn press releases, corporate announcements, and news articles into clean, chunked Markdown for RAG and LLM pipelines. Feed it article URLs or RSS/Atom feeds and get LLM-ready text with citations.

⚑ What you get

FieldDescription
urlSource article URL (citation)
titleArticle / release title
chunkIndex / totalChunksPosition within the article
markdownClean Markdown chunk

🎯 Use cases

  1. AI engineers building news/PR RAG copilots
  2. Market & competitive intel feeding event data to an LLM
  3. PR/IR teams building searchable announcement archives
  4. Fintech/research products needing announcement text with citations

πŸš€ Sample inputs

{"rssFeeds":["https://www.prnewswire.com/rss/news-releases-list.rss"],"maxPerFeed":10}
{"urls":["https://www.businesswire.com/news/home/.../en/..."],"chunkWords":600}

πŸ“¦ Sample output

{"url":"https://www.prnewswire.com/news-releases/...","title":"Acme Raises $50M Series B","chunkIndex":0,"totalChunks":6,"markdown": "# Acme Corp Raises $50M...
..." }

πŸ“Š Sample Output

πŸ‘ Sample output

πŸ›  How it works

  1. Source β€” fetches article URLs directly, or pulls latest items from RSS/Atom feeds.
  2. Extract β€” isolates the main article (<article>/<main>), strips nav/ads/scripts.
  3. Convert β€” HTML β†’ ATX Markdown.
  4. Chunk β€” ~chunkWords-word chunks for embedding.
  5. Schema β€” one row per chunk, with the source URL as citation.

πŸ”— Related Actors

πŸ’° Pricing Example

Pay-per-event: $0.005 per run + $0.04 per Markdown chunk (document-record).

ChunksCost
100~$4.00
500~$20.00
2,000~$80.00
Apify's $5 free credit covers ~124 chunks. Start free β†’

βš–οΈ Legal & data sources

Fetches publicly-accessible articles/feeds with an identified User-Agent. Respect each publisher's terms for your downstream use; output includes source URLs for attribution.

❓ FAQ

URLs or feeds? Either or both β€” feeds expand to their latest items. Citations? Yes β€” every chunk keeps its source URL. Chunk size? chunkWords (default 800). Paywalled articles? Only public content is reachable. Fresh? Pulled live at run time. Dedup? Repeated URLs in one run are skipped.

πŸ†˜ Troubleshooting

  • Empty markdown β€” the page may be JS-rendered or paywalled.
  • Too much boilerplate β€” the article wrapper wasn't detected; try a direct article URL.
  • Feed returns nothing β€” confirm it's a valid RSS/Atom URL.
  • Huge output β€” lower maxPerFeed or chunkWords.

🏷️ About NexGenData

Structured public-data tools for analysts, developers, and operators. thenextgennexus.com.

You might also like

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds β€” perfect for AI training data, RAG pipelines, and content archiving.

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

News Article To Markdown

extremescrapes/news-article-to-markdown

Extract news articles as clean, ad-free Markdown with automatic author and publish date detection.

πŸ‘ User avatar

Extreme Scrapes

2

News & Article Extractor

automation-lab/news-article-extractor

Auto-discover news/blog articles and extract clean text plus Markdown for LLM/RAG corpora. Uses RSS, sitemaps, and Readability; outputs metadata, counts, and token estimates.

πŸ‘ User avatar

Stas Persiianenko

26

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI agents, RAG, support, and automation workflows.