VOOZH about

URL: https://apify.com/scrapapi/yahoo-scraper

โ‡ฑ ๐Ÿ”Ž Yahoo Scraper ยท Apify


Pricing

from $2.99 / 1,000 results

Go to Apify Store

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ ScrapAPI

ScrapAPI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ”Ž Yahoo Search Scraper

Scrape Yahoo Search results at scale โ€” titles, URLs, snippets, favicons, in-article sub-links, and a clean Markdown excerpt for every result. Bulk queries, time-window filtering and smart proxy auto-escalation (direct โ†’ datacenter โ†’ residential) keep your runs fast and unblocked.


โญ Why Choose Us?

  • Bulk-first โ€” paste dozens of queries (or full Yahoo URLs) and walk every result page until the cap.
  • Smart proxy ladder โ€” starts direct, only escalates if Yahoo blocks. You don't pay for residential traffic you didn't need.
  • Rich back-fill โ€” when Yahoo's snippet is thin, the actor visits the result page and harvests in-article sub-links + a Markdown summary.
  • Live results โ€” rows stream to the dataset as they're scraped, so a mid-run interruption never loses your data.
  • Production-grade error handling โ€” 3-tier proxy retries, graceful PPE limit handling, exponential cool-downs.

๐Ÿ”‘ Key Features

  • ๐ŸŒ Bulk queries โ€” plain keywords or full Yahoo SERP URLs, mixed freely.
  • ๐Ÿ“… Time-window filter โ€” Anytime / Past day / Past week / Past month.
  • ๐Ÿ›ก๏ธ Auto-escalating proxy: direct โ†’ Apify Datacenter โ†’ Apify Residential (3 retries), then sticky.
  • ๐Ÿงฉ Optional second-pass back-fill of sub-links + Markdown excerpts.
  • ๐Ÿ“‹ Per-section dataset views: Overview, Snippet, Sub-links.
  • ๐Ÿ”„ Custom proxy URLs supported โ€” they go first, then the smart ladder.

๐Ÿงพ Input

{
"queries":[
"java developer",
"https://search.yahoo.com/search?p=python+jobs"
],
"maxItems":10,
"timePeriod":"Anytime",
"backfillEmptyResults":true,
"backfillConcurrency":8,
"backfillMaxLinks":10,
"proxyConfiguration":{"useApifyProxy":false}
}
FieldTypeDescription
queriesstring[]One or more search terms or Yahoo SERP URLs.
maxItemsintegerHard cap on unique results per query (1โ€“500).
timePeriodstringAnytime / Past day / Past week / Past month.
backfillEmptyResultsbooleanVisit each result page to harvest sub-links + Markdown excerpt.
backfillConcurrencyintegerParallelism for back-fill (1โ€“32).
backfillMaxLinksintegerMax in-article sub-links per result page (1โ€“50).
proxyConfigurationobjectApify proxy config. Defaults to direct (no proxy).

๐Ÿ“ค Output

Each row matches the per-section views in the dataset.

{
"query":"java developer",
"title":"How to become a Java Developer? - GeeksforGeeks",
"url":"https://www.geeksforgeeks.org/gfg-academy/how-to-become-a-java-developer/",
"description":"A Java developer is a software engineer who builds...",
"text":" * Core Java\n\nCore Fundamentals: Learn concepts and practice DSA...\n",
"logo_url":"https://s.yimg.com/pv/.../32x32_7eae5aac8b7f7402.png",
"links":[
"https://www.geeksforgeeks.org/java/java",
"https://www.geeksforgeeks.org/advance-java/spring"
],
"domain":"www.geeksforgeeks.org"
}
FieldDescription
queryThe query (or URL) the row was scraped under.
titleThe result's headline.
urlThe clean target URL (Yahoo's tracker is stripped).
descriptionYahoo's SERP snippet, rendered as Markdown.
textMarkdown excerpt โ€” either Yahoo's list block or, after back-fill, an in-article summary.
logo_urlThe result's favicon.
linksUp to N harvested in-article sub-links (after back-fill).
domainThe host portion of url.

๐Ÿš€ How to Use (Apify Console)

  1. Open Apify Console โ†’ Actors.
  2. Find this actor and open it.
  3. Paste your queries (one per line) into ๐ŸŒ Search Queries / URLs.
  4. Pick a ๐ŸŽ Maximum results cap and a ๐Ÿ“… Time window.
  5. (Optional) Leave proxy on direct โ€” the actor will auto-escalate only when needed.
  6. Click Start.
  7. Watch live logs โ€” rows appear in the Output tab as they're scraped.
  8. Export results as JSON / CSV / XLSX.

๐Ÿค– Use via API

curl-X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"queries": ["java developer"],
"maxItems": 10,
"timePeriod": "Anytime"
}'

๐Ÿ’ผ Best Use Cases

  • SEO & SERP monitoring on Yahoo.
  • Competitive intelligence โ€” track who appears for a query over time.
  • Lead generation โ€” feed result URLs into your own enrichment pipeline.
  • Content discovery โ€” harvest in-article sub-links for further crawling.

๐Ÿ’ณ Pricing

This actor uses Apify's Pay-per-event model. The primary event is result-item โ€” one charge per result row pushed to the dataset. You pay only for the rows you actually receive; back-fill, retries and failed attempts are not billed.

You also pay the underlying Apify platform usage (compute units, proxy traffic when used). Direct (no-proxy) requests cost no proxy traffic at all โ€” which is why the actor stays on direct until Yahoo forces it to escalate.


โ“ Frequently Asked Questions

Does it work when Yahoo blocks me? Yes. The default no-proxy run is the fastest, but the moment Yahoo returns a block (HTTP 429/503 or a captcha page), the actor auto-escalates to the Apify Datacenter pool, then to Residential with up to 3 retries. Once a tier works, it's locked in for the rest of the run.

Can I bring my own proxies? Yes โ€” paste them into the proxy field's Custom proxy URLs. Your URLs are tried first (3 retries), then the datacenter โ†’ residential fallback ladder kicks in.

Does it follow pagination? Yes. Yahoo returns ~7 results per page; the actor walks pages until your maxItems cap is hit or 3 consecutive pages return nothing.

What about non-Latin queries? Yahoo handles UTF-8 queries natively โ€” paste them as-is.

Why is my back-filled text empty for some rows? Some sites block all bots (or render with JS only). In that case the actor falls back to a minimal block built from Yahoo's own title + description so the field is never blank.


๐Ÿ“จ Support & Feedback

  • Issues / feature requests โ†’ please open a thread on the actor's detail page.
  • Custom solutions โ†’ dev.scraperengine@gmail.com.

You might also like

๐Ÿ”Ž Yahoo Scraper

scraper-engine/yahoo-scraper

๐Ÿ‘ User avatar

Scraper Engine

2

Yahoo Search Results Scraper

bhansalisoft/yahoo-search-results-scraper

Yahoo Search Results Scraper : Scrape Yahoo Search Engine Results Pages (SERPs). Enter keyword and Select the country and extract organic and paid results from Yahoo.com.

Yahoo Images Scraper

searchapi/yahoo-images-scraper

Scrapes image results from Yahoo Images Search (images.search.Yahoo.com). Extracts image URL, thumbnail, source, title, dimensions, and more.

Yahoo Search Scraper

searchapi/yahoo-search-scraper

Scrapes organic web search results from Yahoo Search (search.Yahoo.com). Extracts title, link, snippet, domain, displayed URL, date, and more.