AI Research Radar โ compliant feed of new AI papers and news
Pricing
from $0.50 / 1,000 results
AI Research Radar โ compliant feed of new AI papers and news
AI research feed of new ML papers and AI news from HuggingFace, Anthropic, Google, The Decoder โ structured JSON, robots-compliant.
Pricing
from $0.50 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
AI Research Radar
New AI papers, lab announcements, and AI news from five permitted sources, delivered as one structured, schedule-ready feed.
Built for AI newsletter writers, research agents, and trend dashboards. Instead of hand-maintaining a scraper per site, you run one actor and get the latest items from HuggingFace papers and blog, the Anthropic and Google AI newsrooms, and The Decoder as uniform JSON records โ ready to rank, summarize, alert on, or pipe into a RAG index.
What you get
| Field | Meaning |
|---|---|
title | Paper, post, or article headline |
url | Canonical link on the source site |
category | papers, blog, labs, or news โ set per source |
source | Source domain, e.g. huggingface.co |
fetched_at | UTC timestamp of the run (ISO 8601) |
extraction | Extractor version tag (selector_free_v1) |
Quick start
{"sources":[{"url":"https://huggingface.co/papers","category":"papers"},{"url":"https://huggingface.co/blog","category":"blog"},{"url":"https://www.anthropic.com/news","category":"labs"}],"maxItemsPerSource":25}
This returns up to 75 fresh items (25 per source), typically in under a minute. Omit sources entirely to use the full five-source default set, which adds the Google AI blog and The Decoder.
Output example
{"category":"papers","title":"Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution","url":"https://huggingface.co/papers/2606.10917","source":"huggingface.co","fetched_at":"2026-06-10T14:12:08.421337+00:00","extraction":"selector_free_v1"}
Why this one
- Selector-free extraction. Titles are pulled by link-text shape and URL structure rather than page-specific CSS selectors, so the site redesigns that break conventional scrapers do not break this one.
- Layout drift is flagged, never hidden. A source that suddenly yields zero items is marked
zero_yield_check_layoutin the HEALTH report instead of quietly shrinking your feed. - Papers, labs, and press in one schema. The five default sources cover research papers, official lab announcements, and AI journalism, each record tagged with its
category. - Bring your own sources. Pass any list of
{url, category}pages; the same robots check, retry logic, and extraction apply to every source you add. - Fresh by design. Each run is a live snapshot of the source pages โ schedule it hourly or daily and the radar stays current.
Compliance and reliability
Topsail actors are built compliance-first and ship with self-healing plumbing:
- robots.txt is always respected โ fail-closed. If a robots check cannot complete, the source is skipped, never scraped. There is no input to turn this off.
- Sources are public listing and newsroom pages โ HuggingFace papers and blog, Anthropic news, the Google AI blog, and The Decoder โ pages these publishers serve openly to every visitor, with no account, paywall, or personal data involved.
- Transient failures retry once with backoff; persistent failures are reported, not hidden.
- Every run writes a per-source HEALTH report to the key-value store, so you can see exactly which sources delivered and which were blocked, empty, or erroring.
- No PII, no paywalled or login-gated content, no circumvention.
Pricing
Pay per result: $0.50 per 1,000 dataset items โ one item is one paper, post, or article. Sources that come back robots-blocked, erroring, or empty add nothing to the dataset and cost nothing โ you pay only for delivered records. A typical default run of around 100 items costs about $0.05.
Honest limits
- Titles and canonical links only โ no abstracts, authors, publication dates, or article text.
fetched_atis the run timestamp, not the publish date. - Extraction expects headline-shaped link text (at least 4 words and 24 characters), so very short titles can be missed and an occasional non-article link can slip through.
- Only same-domain links are collected from each source page.
- Pages that render their listings entirely with JavaScript yield zero items; the run flags them in HEALTH rather than failing.
- No cross-run deduplication or diff detection โ each run is a full snapshot. Dedupe by
urldownstream if you ingest continuously.
FAQ
Can I use this as an ML papers API? Yes. Trigger runs on a schedule through the Apify API and read the dataset as JSON or CSV โ a lightweight ML papers API without maintaining your own scraper.
How fresh is the AI research feed? Each run is a live snapshot of the source pages at run time. Schedule the actor hourly or daily to keep an always-current AI news feed.
Can I add my own sources?
Yes. sources accepts any list of {url, category} pages. The robots check and selector-free extraction apply to every source you add; blog-style listing pages work best.
Does it return abstracts or full article text? No โ titles and canonical links only. Pair it with Topsail's Site to Markdown actor when you need full LLM-ready page content.
What happens when a source site redesigns?
Usually nothing: extraction keys on link-text shape and URL structure, not page-specific selectors. If a source still drops to zero items, the run flags it as zero_yield_check_layout in the HEALTH report.
More compliant data feeds from Topsail
- Site to Markdown โ any site to clean LLM-ready markdown
- GTA 6 Countdown & Developments Tracker โ countdown, confirmed facts, diffed developments, market odds
- Commodity Intel โ oil, gold, uranium headlines from permitted sources
- Crypto News โ BTC/ETH/DeFi headlines from major outlets
