Google News Scraper - Search, Topics & Headlines
Pricing
from $2.00 / 1,000 results
Google News Scraper - Search, Topics & Headlines
Scrape Google News without an API key: search queries, topic/geo headlines, and top stories. Optionally resolve real publisher URLs and extract article text. Export JSON/CSV/Excel.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Share
Google News Scraper โ Search, Topics, Headlines & Article Text (No API Key)
The Google News scraper collects news headlines from Google News without an API key or login: run search queries, pull topic and geo headline feeds, and grab the front-page top stories โ in any language/country edition. Optionally resolve the real publisher URL behind each Google News redirect and extract the full article text, author, and publish date. Use it as a Google News API alternative and export to JSON, CSV, or Excel.
Built for media monitoring, brand and PR tracking, SEO and content research, and news aggregation โ a fast Google News data extractor that turns public RSS feeds into clean, structured records.
Pairs with the Hacker News Scraper for full news + tech-community monitoring in one pipeline.
Why this scraper
Google News exposes clean RSS feeds, but two things make them awkward to use directly: every link is a news.google.com redirect (not the publisher), and the feeds are point-in-time snapshots with no pagination. This Actor parses the feeds into structured records, optionally resolves the redirect to the real publisher URL via Google's current 2-step batchexecute flow, and lets you broaden coverage by splitting queries with Google News operators (when:, site:, date ranges). It only claims what it does โ resolution and article-text extraction are best-effort and never fail the run.
What it does
- ๐ Search feeds โ each query becomes a
/rss/search?q=feed. Supports Google News operators:when:7d,site:reuters.com,intitle:ai,before:/after:dates, booleanOR, exact"quotes", and-exclusions. - ๐๏ธ Topic feeds โ WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
- ๐ Geo feeds โ place-based headlines, e.g.
London,San Francisco. - ๐ Top stories โ the front-page feed for the chosen edition.
- ๐ Any edition โ
language+countrydrivehl,gl, and the matchingceid. - ๐ Real publisher URLs (optional) โ resolve each
news.google.comredirect to the actual article URL (best-effort). - ๐ Article text (optional) โ fetch the resolved page and extract main body text, author, and published date (best-effort; publishers vary).
Example input
{"queries":["artificial intelligence","tesla when:7d","site:reuters.com climate"],"topics":["TECHNOLOGY","BUSINESS"],"geoLocations":["London"],"includeTopStories":false,"language":"en-US","country":"US","maxItemsPerQuery":100,"maxItems":1000,"resolveArticleUrls":false,"scrapeArticleText":false,"proxyConfiguration":{"useApifyProxy":true}}
Example output (article)
{"type":"article","title":"OpenAI announces new model","link":"https://news.google.com/rss/articles/CBMiโฆ?oc=5","googleNewsUrl":"https://news.google.com/rss/articles/CBMiโฆ?oc=5","guid":"CBMiโฆ","articleId":"CBMiโฆ","source":"Reuters","sourceUrl":"https://www.reuters.com","publishedAt":"2026-06-17T22:55:21.000Z","pubDate":"Wed, 17 Jun 2026 22:55:21 GMT","snippet":"OpenAI announced โฆ Reuters","query":"artificial intelligence","feedType":"search","language":"en-US","country":"US","scrapedAt":"2026-06-20T10:00:00.000Z"}
Example output (with resolveArticleUrls + scrapeArticleText)
When those options are enabled, link becomes the real publisher URL (when resolution succeeds) and extra fields are added:
{"type":"article","title":"OpenAI announces new model","link":"https://www.reuters.com/technology/openai-โฆ","googleNewsUrl":"https://news.google.com/rss/articles/CBMiโฆ?oc=5","source":"Reuters","publishedAt":"2026-06-17T22:55:21.000Z","articleResolved":true,"articleText":"OpenAI on Wednesday announced โฆ\n\nโฆ","articleAuthor":"Jane Doe","articlePublishedAt":"2026-06-17T22:50:00.000Z","feedType":"search","language":"en-US","country":"US"}
Output fields
Every item is pushed with type: "article". The fields below are exactly what the code emits โ nothing more, nothing less.
| Field | Type | Description |
|---|---|---|
type | string | Always "article" (the only output item type). |
title | string | Headline text. |
link | string | Real publisher URL when resolveArticleUrls succeeds, otherwise the Google News redirect. |
googleNewsUrl | string | Always the original news.google.com redirect. |
guid | string | The article GUID (CBMi id, isPermaLink="false"). |
articleId | string | The CBMi article id extracted from the guid/link. |
source | string | Publisher display name (from <source>). |
sourceUrl | string | Publisher homepage (from the <source url="โฆ"> attribute). |
publishedAt | string | Publish time as ISO 8601 (parsed from RFC-822 <pubDate>); null if unparseable. |
pubDate | string | Raw <pubDate> string. |
snippet | string | Plain-text snippet stripped from the <description> HTML. |
query | string | Originating query / topic / place (null for top stories). |
feedType | string | search, topic, geo, or top. |
language | string | Language used for the feed. |
country | string | Country used for the feed. |
scrapedAt | string | ISO 8601 timestamp of when the item was collected. |
Extra fields when resolveArticleUrls is enabled
| Field | Type | Description |
|---|---|---|
articleResolved | boolean | Whether the redirect was resolved to a real publisher URL. false when resolution failed (the item then keeps the Google News redirect in link). |
Extra fields when scrapeArticleText is enabled (implies resolveArticleUrls)
| Field | Type | Description |
|---|---|---|
articleText | string | Extracted main body text (null if extraction failed, paywalled, or under the ~80-char threshold). |
articleAuthor | string | Author from the publisher page meta tags / byline (best-effort; null if not found). |
articlePublishedAt | string | Published date from the publisher page meta (ISO 8601 when parseable, else the raw meta string; null if not found). |
Input reference
| Field | Type | Description |
|---|---|---|
queries | array | Free-text search queries; each becomes a /rss/search?q= feed. Supports Google News operators. |
topics | array (enum) | Topic feeds: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH. |
geoLocations | array | Place names for geo headline feeds, e.g. London. |
includeTopStories | boolean | Also fetch the front-page Top stories feed. Default false. |
language | string | Interface/content language (hl), e.g. en-US. Default en-US. |
country | string | Edition country (gl), e.g. US. Default US. |
maxItemsPerQuery | integer | Cap items taken from each feed (0 = no cap). Default 100. |
maxItems | integer | Global cap on total articles (0 = no limit). Default 1000. |
resolveArticleUrls | boolean | Resolve each redirect to the real publisher URL. Default false. |
scrapeArticleText | boolean | Fetch the resolved page and extract text/author/date (requires resolve). Default false. |
proxyConfiguration | object | Apify proxy settings. Default useApifyProxy: true. |
Common use cases
- Media & brand monitoring โ track Google News mentions of a company, product, or person with
"brand"queries andwhen:time windows; export the dataset to CSV/Excel for reporting. - PR & competitor tracking โ follow
site:and topic feeds to see who is covering what across publishers and countries. - SEO & content research / news aggregation โ pull topic/geo headlines for a daily digest, newsletter, or content calendar without a Google News API key.
- Dataset building & NLP โ resolve real publisher URLs and extract full article text, author, and publish date for downstream sentiment analysis, summarization, or model training.
- Market & finance signals โ monitor ticker, sector, or policy queries in near real time and feed the JSON output into dashboards or alerts.
Notes & limits
- No pagination. Google News feeds are point-in-time snapshots: search feeds return up to ~100 items and topic/geo feeds up to ~30.
maxItemsPerQueryonly trims a feed โ to get more coverage, split a query withwhen:/site:/before:/after:operators rather than expecting more items per query. - Redirect resolution is best-effort. Current article ids are the long
CBMiโฆform whose publisher URL is not decodable offline; resolution uses Google's 2-stepbatchexecuteflow, which Google changes periodically. On failure the item keeps the Google News redirect andarticleResolvedisfalse. - Article-text extraction is best-effort. Publisher pages vary; some paywall, 403, or show consent walls, so
articleTextcan be empty for some items. A single publisher failure never fails the run. - Resolution/text add cost & blocking risk. They issue 1โ2 extra requests per item against Google's heavier app surface and arbitrary publisher sites, which throttle sooner than the RSS feeds. The Actor caps concurrency and rotates sessions; use residential proxies for large runs.
- ceid must match the locale. It is built automatically as
{COUNTRY}:{language-base}(e.g.en-US+USโUS:en); a mismatch returns empty or wrong-locale results. - Exotic locales. Friendly topic names (TECHNOLOGY, โฆ) are verified for
en-USand major editions; some unusual locales use opaque base64 topic ids โ you can pass such an id directly intopics.
FAQ
Do I need a Google News API key or login? No. The Actor reads public Google News RSS feeds โ there is no API key, OAuth, or login involved. It is a key-free Google News API alternative.
Why is link a news.google.com URL? That is the feed's redirect. Enable resolveArticleUrls to turn it into the real publisher URL (best-effort). The original redirect is always preserved in googleNewsUrl.
Why did I only get ~100 results for my query? Google News search feeds cap at roughly 100 items and topic/geo feeds at ~30, with no pagination โ they are point-in-time snapshots. Split the query with operators like when:7d, before:/after: date ranges, or site: to cover more, or schedule the run to collect over time.
Can I scrape Google News for any country or language? Yes. Set language (e.g. en-US, en-GB, es-419, fr) and country (e.g. US, GB, IN, DE). They drive hl, gl, and the ceid parameter, which is built automatically as {COUNTRY}:{language-base} (so en-US + US โ ceid=US:en). A mismatch returns empty or wrong-locale results.
How do I control how many articles I get? Use maxItemsPerQuery to cap items taken from each feed (trims the snapshot; 0 = no cap) and maxItems for a global cap across all feeds (0 = no limit). Defaults are 100 per feed and 1000 total.
Which proxy should I use? Datacenter (the default useApifyProxy: true) is fine for the RSS feeds. Switch to residential if you enable resolution/article-text at volume and start hitting blocks โ those features add 1โ2 heavier requests per item against Google's app surface and publisher sites.
Why is articleText empty for some items? The publisher paywalled, returned 403, showed a consent wall, or rendered the body with JavaScript. Extraction is best-effort, uses meta tags + common article containers, and intentionally never fails the run.
Can I export the results to JSON, CSV, or Excel? Yes. Every run's dataset can be exported to JSON, CSV, Excel (XLSX), HTML, RSS, or XML, or pulled via the Apify API/SDK and webhooks for downstream pipelines.
Does this work with topics and geographic (place) feeds, not just search? Yes. Provide any combination of queries (search), topics (WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH), geoLocations (place names), and includeTopStories (front-page feed). Each becomes its own RSS feed.
Can I monitor news continuously? Yes. Schedule the Actor (e.g. hourly/daily) to build a rolling dataset for media monitoring or brand tracking. Pair it with the Hacker News Scraper to cover both mainstream news and the tech community in one workflow.
