VOOZH about

URL: https://apify.com/xmolodtsov/google-news-scraper

โ‡ฑ Google News Scraper ยท Apify


Pricing

$20.00/month + usage

Go to Apify Store

Extract full Google News articles with text, images & metadata. 95%+ success rate, multi-region support, smart content extraction with automatic fallbacks. Production-ready & cost-optimized

Pricing

$20.00/month + usage

Rating

5.0

(1)

Developer

๐Ÿ‘ Yevhenii Molodtsov

Yevhenii Molodtsov

Maintained by Community

Actor stats

2

Bookmarked

18

Total users

2

Monthly active users

4 months ago

Last modified

Share

Google News Bulk Scraper

Google News โ†’ publisher URLs โ†’ clean article text + images + metadata, with JS rendering, paywall, and consent-page fallbacks. HTTP-first, Playwright only when needed.

Scrape one query or thousands in a single run. Each article lands as its own dataset row with the full text, images, author, source, language, and a quality score โ€” ready for NLP pipelines, media monitoring, or research datasets.

What You Get

Each article in the output includes:

  • title โ€” headline as published
  • url โ€” canonical publisher URL (not the Google News redirect)
  • source โ€” publisher name (e.g. "Reuters", "TechCrunch")
  • publishedAt โ€” ISO 8601 timestamp
  • author โ€” byline when available
  • text โ€” clean full-text content (300+ characters, validated)
  • images โ€” OG image, featured image, and in-article images with alt text
  • language โ€” detected content language
  • extractionSuccess โ€” boolean flag for downstream filtering
  • contentQuality โ€” score (0-100), level (low/medium/high), and warnings

Set fetchArticleDetails: false to skip crawling and get RSS metadata only (title, source, date, link) at minimal cost.

Quick Start

Using Apify Console

  1. Visit Apify Console
  2. Search for "Google News Scraper"
  3. Configure your search parameters
  4. Run the actor

Using Apify CLI

npminstall-g apify-cli
# Single query
apify call google-news-scraper --input'{
"query": "Tesla",
"maxItemsPerUrl": 10
}'
# Multiple queries (string shorthand)
apify call google-news-scraper --input'{
"queries": ["tesla", "apple"],
"maxItemsPerUrl": 10
}'
# Multiple queries with passthrough fields
apify call google-news-scraper --input'{
"queries": [
{ "query": "Kim Kardashian", "profileUrl": "https://news.google.com/search?q=kim+kardashian" },
{ "query": "MrBeast" }
],
"maxItemsPerUrl": 10,
"maxItems": 15
}'

Using Apify API

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor('google-news-scraper').call({
queries:[
{query:'Taylor Swift',profileUrl:'https://news.google.com/search?q=taylor+swift'},
{query:'Elon Musk',profileUrl:'https://news.google.com/search?q=elon+musk'},
],
maxItemsPerUrl:10,
maxItems:50,
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
// items is a flat array of articles, each with query + passthrough fields merged in
console.log(items);

Input Modes

Most Common: Single Query

{
"query":"artificial intelligence",
"maxItemsPerUrl":10
}

That's it โ€” one query, up to 10 articles.

Bulk: Multiple Queries

Pass an array of strings to scrape several topics in one run:

{
"queries":["tesla","apple","nvidia"],
"maxItemsPerUrl":10
}

Advanced: Queries with Passthrough Fields

Each query can be an object. Any field besides query is passed through to every output article for that query โ€” useful for linking results back to your own IDs, profile URLs, or tags:

{
"queries":[
{"query":"Kim Kardashian","profileUrl":"https://news.google.com/search?q=kim+kardashian"},
{"query":"MrBeast","customField":"my-tag"},
"Taylor Swift"
],
"maxItemsPerUrl":10,
"maxItems":25
}

Precedence: queries > query. If both are provided, queries wins.

Configuration

Input Parameters

ParameterTypeRequiredDefaultDescription
querystringNo*-Simple search query string
queriesarrayNo*-Array of strings or objects with query and optional passthrough fields
maxItemsPerUrlintegerNo50Max articles per individual query
maxItemsintegerNo0Optional global cap on total articles (0 = unlimited)
fetchArticleDetailsbooleanNotrueIf false, skip article crawling and return RSS metadata only
regionstringNo"US"Country code (US, GB, CA, AU, DE, ES, MX, IT)
languagestringNo"en-US"Language code (en-US, en-GB, en-CA, en-AU, de-DE, es-ES, es-MX, it-IT)
dateFromstringNo-Start date (YYYY-MM-DD)
dateTostringNo-End date (YYYY-MM-DD)
disableBrowserFallbackbooleanNofalseSkip Playwright fallback โ€” cheaper but may return fewer articles
proxyConfigurationobjectNoApify Proxy enabledProxy settings; defaults to Apify Proxy

*At least one of query or queries is required.

How Extraction Works

The pipeline resolves every Google News redirect to the real publisher URL, then extracts content through six ordered strategies โ€” stopping at the first one that produces 300+ characters of text with images:

  1. HTTP fetch โ€” fast, cheap, works for most publishers
  2. Playwright browser โ€” automatic fallback for JS-rendered or consent-gated pages
  3. Readability / Extractus / JSON-LD / custom selectors / meta tags / heuristics โ€” six extraction strategies tried in order

Every article is quality-scored (text length, image presence, error-page detection). Low-quality results are filtered before they reach your dataset.

Estimated Cost

All costs depend on article count, target sites, and proxy tier. The numbers below are rough guidelines based on typical runs using Apify Proxy (datacenter tier).

ScenarioArticlesTypical Cost
RSS metadata only (fetchArticleDetails: false)100~$0.01 โ€“ $0.02
Full text, HTTP-first (most sites)100~$0.05 โ€“ $0.10
Full text, mixed HTTP + Playwright fallback100~$0.10 โ€“ $0.25
Heavy JS sites (frequent Playwright)100~$0.20 โ€“ $0.50

Cost levers you control:

  • fetchArticleDetails: false โ€” skip article crawling entirely for near-zero cost
  • disableBrowserFallback: true โ€” stay HTTP-only, ~2-5x cheaper, fewer articles from JS-heavy sites
  • maxItemsPerUrl / maxItems โ€” hard caps on article count
  • Proxy tier โ€” datacenter is default and cheapest; residential auto-escalates only on repeated 429/403 errors

Limitations

Be aware of these before you buy:

  • Paywalled sites โ€” articles behind hard paywalls (WSJ, FT, NYT subscriber-only) will return partial text or fail. The scraper extracts whatever is publicly visible.
  • Heavy bot protection โ€” sites with aggressive Cloudflare challenges or CAPTCHAs may need multiple retries and residential proxies, increasing cost.
  • Region/language variance โ€” Google News returns different articles depending on region and language. The same query may yield different results from US vs DE.
  • RSS feed limits โ€” Google News RSS feeds return a limited window of articles (roughly 24-72 hours). For historical coverage, use dateFrom/dateTo date slicing, which the scraper handles automatically.
  • Image availability โ€” some publishers strip images or serve them via CDN policies that block external access. Articles without valid images receive a lower quality score.

Output Format

Output is a flat array of articles. Each article is a separate dataset entry with the query string and any passthrough fields merged at the top level:

[
{
"query":"Taylor Swift",
"profileUrl":"https://news.google.com/search?q=taylor+swift",
"title":"Taylor Swift Announces New Album - Billboard",
"url":"https://www.billboard.com/2025/08/05/taylor-swift-new-album.html",
"source":"Billboard",
"publishedAt":"2025-08-05T14:08:57.000Z",
"author":"Jane Smith",
"text":"Full article content...",
"description":"Brief summary of the article...",
"images":[
{
"url":"https://example.com/image.jpg",
"type":"featured-og",
"alt":"Image description"
}
],
"tags":["Taylor Swift"],
"language":"en",
"extractionSuccess":true,
"contentQuality":{
"score":85,
"level":"high",
"isValid":true,
"warnings":[]
}
},
{
"query":"MrBeast",
"customField":"test-passthrough",
"title":"MrBeast Breaks YouTube Record",
"url":"https://www.example.com/mrbeast-record.html",
"source":"Example News",
"publishedAt":"2025-08-05T10:00:00.000Z",
"text":"Full article content...",
"..."
}
]

Development

Setup

git clone https://github.com/YevheniiM/google-news-scrapper
cd google-news-scrapper
npminstall

Running

# Production
npm start
# Development mode (DEBUG=true, NODE_ENV=development)
npm run dev
# Development with file watching
npm run dev:watch

Create an INPUT.json at the project root for local input:

{
"queries":[{"query":"Taylor Swift"},{"query":"Elon Musk"}],
"maxItemsPerUrl":5
}

Testing

# Run all tests
npmtest
# Watch mode
npm run test:watch
# With coverage
npm run test:coverage

Formatting

npm run format
npm run format:check

License

MIT -- see LICENSE for details.

Acknowledgments

You might also like

Google News Scraper

futurizerush/google-news-scraper

Google News Search Scraper - Real-time news aggregation from Google News. Features smart article enrichment with full content extraction. Perfect for market research, trend analysis, and content monitoring.

Google News Scraper

piotrv1001/google-news-scraper

Scrapes news articles from Google News, extracting titles, sources, publication dates, and links. Search by keywords, browse by topic, or get top headlines with multi-language and region support. Ideal for news monitoring, media analysis, and content aggregation.

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

โœ… CHEAP GOOGLE NEWS SCRAPPER โœ…

shoya/cheap-google-news-scrapper

Extract news articles from Google News with unlimited keywords, custom location, language, and time period filters. Supports advanced search operators, topic-based scraping, and automatic deduplication. One of the most affordable Google News scrapers on Apify optimized for speed and cost efficiency.

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. ๐Ÿ“ฐ๐Ÿ”

1.9K

3.8

Google News Scraper

crawlerbros/google-news-scraper

Scrape Google News in real-time. Supports keyword search, date filters, full-text article extraction, and image extraction.

147

5.0

Google News Realtime Scraper

devisty/google-news

Provide real-time news and articles sourced from Google News

Fast Google News Scraper

aymorato/fast-google-news-scraper

Extract details from Google News articles, such as images, titles, links, and other relevant information.

196

๐Ÿ”ฅ Google News Search Scraper

powerai/google-news-search-scraper

Search Google News and export structured metadata with optional article enrichment.

Google News Scraper

parseforge/google-news-scraper

Monitor the news automatically with our Google News scraper. Track articles by keyword or topic with flexible date filtering and multi language support. Access structured data including headlines, publishers, links, and more. Built for teams that need reliable news insights without manual work.