VOOZH about

URL: https://apify.com/topsail/compliant-ai-research-radar

โ‡ฑ AI & ML Papers & News Scraper API โ€” Compliant ยท Apify


๐Ÿ‘ AI Research Radar โ€” compliant feed of new AI papers and news avatar

AI Research Radar โ€” compliant feed of new AI papers and news

Pricing

from $0.50 / 1,000 results

Go to Apify Store

AI Research Radar โ€” compliant feed of new AI papers and news

AI research feed of new ML papers and AI news from HuggingFace, Anthropic, Google, The Decoder โ€” structured JSON, robots-compliant.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Connor Teskey

Connor Teskey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share

AI Research Radar

New AI papers, lab announcements, and AI news from five permitted sources, delivered as one structured, schedule-ready feed.

Built for AI newsletter writers, research agents, and trend dashboards. Instead of hand-maintaining a scraper per site, you run one actor and get the latest items from HuggingFace papers and blog, the Anthropic and Google AI newsrooms, and The Decoder as uniform JSON records โ€” ready to rank, summarize, alert on, or pipe into a RAG index.

What you get

FieldMeaning
titlePaper, post, or article headline
urlCanonical link on the source site
categorypapers, blog, labs, or news โ€” set per source
sourceSource domain, e.g. huggingface.co
fetched_atUTC timestamp of the run (ISO 8601)
extractionExtractor version tag (selector_free_v1)

Quick start

{
"sources":[
{"url":"https://huggingface.co/papers","category":"papers"},
{"url":"https://huggingface.co/blog","category":"blog"},
{"url":"https://www.anthropic.com/news","category":"labs"}
],
"maxItemsPerSource":25
}

This returns up to 75 fresh items (25 per source), typically in under a minute. Omit sources entirely to use the full five-source default set, which adds the Google AI blog and The Decoder.

Output example

{
"category":"papers",
"title":"Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution",
"url":"https://huggingface.co/papers/2606.10917",
"source":"huggingface.co",
"fetched_at":"2026-06-10T14:12:08.421337+00:00",
"extraction":"selector_free_v1"
}

Why this one

  • Selector-free extraction. Titles are pulled by link-text shape and URL structure rather than page-specific CSS selectors, so the site redesigns that break conventional scrapers do not break this one.
  • Layout drift is flagged, never hidden. A source that suddenly yields zero items is marked zero_yield_check_layout in the HEALTH report instead of quietly shrinking your feed.
  • Papers, labs, and press in one schema. The five default sources cover research papers, official lab announcements, and AI journalism, each record tagged with its category.
  • Bring your own sources. Pass any list of {url, category} pages; the same robots check, retry logic, and extraction apply to every source you add.
  • Fresh by design. Each run is a live snapshot of the source pages โ€” schedule it hourly or daily and the radar stays current.

Compliance and reliability

Topsail actors are built compliance-first and ship with self-healing plumbing:

  • robots.txt is always respected โ€” fail-closed. If a robots check cannot complete, the source is skipped, never scraped. There is no input to turn this off.
  • Sources are public listing and newsroom pages โ€” HuggingFace papers and blog, Anthropic news, the Google AI blog, and The Decoder โ€” pages these publishers serve openly to every visitor, with no account, paywall, or personal data involved.
  • Transient failures retry once with backoff; persistent failures are reported, not hidden.
  • Every run writes a per-source HEALTH report to the key-value store, so you can see exactly which sources delivered and which were blocked, empty, or erroring.
  • No PII, no paywalled or login-gated content, no circumvention.

Pricing

Pay per result: $0.50 per 1,000 dataset items โ€” one item is one paper, post, or article. Sources that come back robots-blocked, erroring, or empty add nothing to the dataset and cost nothing โ€” you pay only for delivered records. A typical default run of around 100 items costs about $0.05.

Honest limits

  • Titles and canonical links only โ€” no abstracts, authors, publication dates, or article text. fetched_at is the run timestamp, not the publish date.
  • Extraction expects headline-shaped link text (at least 4 words and 24 characters), so very short titles can be missed and an occasional non-article link can slip through.
  • Only same-domain links are collected from each source page.
  • Pages that render their listings entirely with JavaScript yield zero items; the run flags them in HEALTH rather than failing.
  • No cross-run deduplication or diff detection โ€” each run is a full snapshot. Dedupe by url downstream if you ingest continuously.

FAQ

Can I use this as an ML papers API? Yes. Trigger runs on a schedule through the Apify API and read the dataset as JSON or CSV โ€” a lightweight ML papers API without maintaining your own scraper.

How fresh is the AI research feed? Each run is a live snapshot of the source pages at run time. Schedule the actor hourly or daily to keep an always-current AI news feed.

Can I add my own sources? Yes. sources accepts any list of {url, category} pages. The robots check and selector-free extraction apply to every source you add; blog-style listing pages work best.

Does it return abstracts or full article text? No โ€” titles and canonical links only. Pair it with Topsail's Site to Markdown actor when you need full LLM-ready page content.

What happens when a source site redesigns? Usually nothing: extraction keys on link-text shape and URL structure, not page-specific selectors. If a source still drops to zero items, the run flags it as zero_yield_check_layout in the HEALTH report.

More compliant data feeds from Topsail

You might also like

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

Anthropic News & Research Scraper

automation-lab/anthropic-scraper

Scrapes news articles and research papers from Anthropic's website. Returns title, date, categories, description, image URL, and optionally full article text.

๐Ÿ‘ User avatar

Stas Persiianenko

2

Crypto News โ€” compliant Bitcoin & DeFi headline feed

topsail/compliant-crypto-news

Compliant crypto news API: a structured Bitcoin news feed and DeFi news headlines from CoinDesk, Decrypt, and CoinTelegraph.

2

Commodity Intel โ€” compliant oil, gold & uranium news feed

topsail/compliant-commodity-intel

Commodity news API: oil news feed, gold news, silver and uranium headlines as structured JSON from robots-compliant public sources.

2

AI News Aggregator

david_flagg/ai-news-aggregator

Aggregate AI and ML news from Hacker News, Papers With Code, MIT Technology Review, The Batch, and Import AI. Filter by keywords, date range, minimum score. Get titles, URLs, authors, summaries, topic tags, arXiv links, and code repos. Real-time data, sorted by date or relevance.

Papers with Code Scraper

crawlerbros/papers-with-code-scraper

Scrape Papers with Code like search ML papers, fetch paper details with repos and results, browse ML tasks and leaderboards, search datasets, and find ML methods.