VOOZH about

URL: https://apify.com/workhard3000/news-intelligence-rag-extractor

⇱ Universal News Intelligence Agent Β· Apify


πŸ‘ Universal News Article Intelligence Agent avatar

Universal News Article Intelligence Agent

Pricing

from $10.00 / 1,000 successful research extractions

Go to Apify Store

Universal News Article Intelligence Agent

High-fidelity news normalization for AI & Agentic RAG. Extract clean Markdown, full-text, and metadata from premium domains (Bloomberg, Wall Street Journal, Financial Times, New York Times, Washington Post, etc.). Success-only billing, only pay when full-text is verified.

Pricing

from $10.00 / 1,000 successful research extractions

Rating

5.0

(11)

Developer

πŸ‘ WorkHard3000

WorkHard3000

Maintained by Community

Actor stats

13

Bookmarked

49

Total users

10

Monthly active users

2.1 days

Issues response

4 days ago

Last modified

Categories

Share

Universal News Article Intelligence Agent β€” High-Fidelity RAG Content Connector

Retrieve structured metadata and normalized full-text content from high-complexity global news domains. Optimized for LLMs, Agentic RAG, market research pipelines, and automated intelligence.

What does this Agent do?

This Actor is a professional-grade Content Normalization Agent designed to bridge the gap between complex web architectures and AI systems. It transforms unstructured data from premium financial and global news domains into clean, standardized Markdown, ready for immediate use in RAG (Retrieval-Augmented Generation) pipelines and LLMs.

Using a proprietary multi-step extraction engine, this Agent ensures that you receive the full research-grade text required for deep analysis, rather than the truncated snippets or "Subscription Required" notices returned by standard scrapers.

Input: A list of article URLs (one or many). Output: Structured JSON with title, author, date, full text, cleaned Markdown, and high-resolution metadata.


Success-Only Pricing (Verified Research Extraction)

We operate on a Quality-First billing model. You are only billed when we successfully deliver research-ready data. Higher Apify subscription plans unlock progressively lower per-extraction rates.

ScenarioFREEBRONZESILVERGOLD
Verified Research Extraction (Full text, 500+ chars)$0.025$0.020$0.015$0.010
Per 1,000 extractions$25.00$20.00$15.00$10.00
Incomplete Retrieval (blocked or snippet)$0.00$0.00$0.00$0.00
Insufficient Content (Under 500 characters)$0.00$0.00$0.00$0.00

Discount tiers are determined by your Apify subscription plan: Free ($0/mo), Starter/BRONZE ($29/mo), Scale/SILVER ($199/mo), Business/GOLD ($999/mo).

The Math of Value: Standard "pay-per-result" tools charge their full markup for every item, even if it's a 403 error or a paywall snippet. With this Agent, if the extraction is not successful, you pay $0.00 in Actor fees, incurring only the nominal Apify platform usage cost for the compute time (typically less than a penny).


Strategic Capabilities

  • High-Fidelity Content Retrieval: Optimized for high-complexity research domains (Bloomberg, WSJ, Financial Times, The Economist, NYT, and more).
  • AI-Ready Markdown: Automatically normalizes content by removing non-essential elements (ads, nav-bars, scripts), reducing LLM token consumption by up to 80%.
  • Market Intelligence Ready: Parses structured metadata (Byline, ISO Date, Featured Images) for immediate database ingestion.
  • Real-Time Stream Support: Results are pushed to the dataset as they complete, making it ideal for 24/7 monitoring pipelines.
  • Automated Resilience: Advanced internal logic handles difficult-to-render architectures to ensure consistent delivery.

Enterprise Use Cases

Financial Intelligence & Quantitative Analysis

Feed high-fidelity market news directly into sentiment models or trading algorithms. Monitor global financial publications with zero maintenance overhead.

RAG & Knowledge Base Construction

Build a high-quality "News Memory" for AI Agents. Our clean Markdown output ensures your vector database contains only the core analysis, saving costs and improving accuracy.

Competitive Intelligence

Track industry shifts across multiple premium publications with a single API key. Standardize all sources into one unified JSON schema for cross-platform comparison.


What data can you extract?

FieldDescriptionExample
urlOriginal article URLhttps://www.bloomberg.com/news/articles/...
titleArticle headline"What to Watch as China's Leaders Hash Out Plan"
domainSource domainbloomberg.com
bylineAuthor name(s)"Jennifer Schuessler"
publishedDateISO 8601 publication date"2026-03-07T10:03:00.000Z"
textFull article as clean plain text"The National Endowment for the Humanities..."
markdownFull article as Markdown"# Article Title\n\nFull text here..."
excerptArticle summary/description"The agency used AI to flag grants..."
imageFeatured/OG image URL"https://static01.nyt.com/images/..."
siteNamePublication name"bloomberg.com"
elapsedMsExtraction time in milliseconds5090

Verified High-Complexity Research Domains

This Agent features specialized extraction logic for the following global institutions (tested March 2026). It also supports hundreds of additional news domains via its universal normalization engine.

Financial & Market Intelligence: Bloomberg, Wall Street Journal (WSJ), Financial Times (FT), Australian Financial Review, Handelsblatt.

Global Policy & Analysis: The Economist, New York Times, Washington Post, Foreign Affairs, Politico, The Hill.

Innovation & Strategy: Wired, MIT Technology Review, Harvard Business Review, Fortune, Time.

International Perspectives: Le Monde, Der Spiegel, Nikkei Asia, South China Morning Post, Japan Times, The Straits Times, El Pais, Corriere della Sera, Haaretz, Irish Times.

Commonwealth & UK: The Telegraph, The Times, The Guardian, The Independent, New Statesman, The Australian, Globe and Mail.

US Regional & Culture: Los Angeles Times, Chicago Tribune, Boston Globe, SF Chronicle, Seattle Times, The Atlantic, The New Yorker, Vanity Fair, Business Insider, Salon, Slate, The Daily Beast.


How to Use

  1. Input URLs: Paste your target research links into the articleUrls field.
  2. Execute: Click Start. The Agent will begin high-fidelity extraction.
  3. Export: Download your data in JSON, CSV, or feed it via Webhook to your AI pipeline.

API Implementation

curl-X POST "https://api.apify.com/v2/acts/workhard3000~news-intelligence-rag-extractor/runs?token=YOUR_TOKEN"\
-H"Content-Type: application/json"\
-d'{"articleUrls": ["https://www.bloomberg.com/news/articles/..."]}'

Output Example

{
"url":"https://www.bloomberg.com/news/articles/2026-03-03/what-to-watch-as-china-s-leaders-hash-out-plan-for-economic-path",
"domain":"bloomberg.com",
"title":"What to Watch as China's Leaders Hash Out Plan for Economic Path",
"text":"China's annual legislative meetings are set to kick off this week...",
"markdown":"# What to Watch as China's Leaders Hash Out Plan\n\nChina's annual legislative meetings...",
"excerpt":"The National People's Congress opens amid uncertainty over trade tensions.",
"byline":"Bloomberg News",
"publishedDate":"2026-03-03T08:00:00.000Z",
"image":"https://assets.bwbx.io/images/...",
"siteName":"bloomberg.com",
"extractedAt":"2026-03-08T15:30:00.000Z",
"elapsedMs":3973
}

Input Parameters

ParameterTypeDefaultDescription
articleUrlsArray of stringsrequiredList of article URLs to extract
autoArchiveBooleantrueTry web archives as a last resort if direct extraction fails
maxRetriesInteger3Number of retry attempts per URL (1–10)
proxyConfigurationObjectResidentialProxy settings β€” residential proxies are used by default

Integrations

Results are available via the Apify API and can be connected to:

  • Webhooks β€” trigger downstream processing when a run completes
  • Google Sheets β€” export results directly to a spreadsheet
  • Slack / Email β€” get notifications with extracted article summaries
  • Zapier / Make β€” connect to 5,000+ apps
  • Amazon S3 / Google Cloud Storage β€” store results in your cloud bucket
  • Custom API β€” fetch results programmatically via the Apify dataset API

Compliance & Legal Disclaimer

  • Research Intent: This tool is a technical instrument intended for authorized academic research, internal data analysis, and interoperability testing between web formats and AI systems.
  • Content Neutrality: This Actor does not host, cache, or redistribute copyrighted material. It acts as a format converter (HTML to Markdown) to facilitate data portability for research environments.
  • User Responsibility: Users are solely responsible for ensuring their data acquisition complies with the source's Terms of Service and local laws. Use of this tool constitutes agreement that the developer is not liable for any third-party misuse.

FAQ

How does the tiered pricing work?

High-complexity domains like Bloomberg and the Financial Times require significant compute resources to normalize into clean Markdown. We only charge when the full text is successfully retrieved, ensuring you never pay for an incomplete or blocked request. Your per-extraction rate is determined by your Apify subscription plan: FREE ($0.025), BRONZE ($0.020), SILVER ($0.015), or GOLD ($0.010). Higher plans unlock up to 60% savings.

What if the content cannot be retrieved?

If the Agent encounters a page it cannot normalize to our quality standards, it returns an error field and you are not charged. You only pay for successful, full-text delivery.

Is this safe for real-time monitoring?

Yes. Since there is no "Base Fee," you can schedule this Actor to check for new links frequently. You will only be billed when the Agent successfully delivers a new, full-text article.

Can I extract articles in languages other than English?

Yes. The Agent successfully normalizes French (Le Monde), German (Der Spiegel, Handelsblatt), Japanese (Nikkei Asia, Japan Times), Spanish (El Pais), Italian (Corriere della Sera), Hebrew (Haaretz), and Chinese (SCMP) content. The extraction engine is language-agnostic.

How fast is the extraction?

Most articles are extracted in 2–8 seconds. Some sites with aggressive protection may take 15–40 seconds due to retry logic. The elapsedMs field in the output tells you exactly how long each article took.

You might also like

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Articles Extractor

web.harvester/articles-extractor

The Article Extractor is an enterprise-grade web scraping solution designed specifically for extracting structured data from news articles, blog posts, and online publications. Our advanced HTML parsing engine delivers unmatched accuracy in content extraction across thousands of websites.

753

5.0

BIN Lookup

greip/bin-lookup

Lookup/validate any Bank Identification Numbers (BINs) and retrieve comprehensive card information including scheme, bank details, and country information. Ideal for payment processing, fraud detection, and card validation.

Google News Scraper

crawlerbros/google-news-scraper

Scrape Google News in real-time. Supports keyword search, date filters, full-text article extraction, and image extraction.

140

5.0

Google News Scraper

futurizerush/google-news-scraper

Google News Search Scraper - Real-time news aggregation from Google News. Features smart article enrichment with full content extraction. Perfect for market research, trend analysis, and content monitoring.

Bin Checker Pro

burbn/bin-checker-pro

Fast BIN lookup for credit/debit cards. Get card scheme (Visa/Mastercard/Amex), type (credit/debit/prepaid), issuer bank details (name/website/phone), and country info (flag, region, currency). perfect for payment validation, risk checks, and e‑commerce workflows.

BIN/IP Lookup Checker

easyapi/bin-ip-lookup-checker

Our API provides online merchants with detailed insights into credit/debit card transactions. By submitting the Bank Identification Number (BIN), along with the client’s IP address, users can generate risk scores and access extensive BIN information. Empower your transaction assessments today!

Financial Times News Scraper

xtracto/ft-scraper

Seamlessly retrieves full Financial Times articles by bypassing Cloudflare protection without requiring expensive residential proxies.

πŸ‘ User avatar

Farhan Febrian Nauval

29

5.0

BIN Checker

dhhoang.dn2/bin-checker

Actor to check BIN information of payment cards worldwide

πŸ‘ User avatar

Đinh Huy Hoàng

110

4.0

Bloomberg Category News Scraper

piotrv1001/bloomberg-category-news-scraper

The Bloomberg Category News Scraper extracts news articles from Bloomberg by category, capturing headlines, authors, publish dates, images, and article links. Ideal for news aggregation, market analysis, and trend monitoring.

61

5.0