arXiv Scraper - Scientific Papers, Abstracts & PDFs
Pricing
Pay per usage
arXiv Scraper - Scientific Papers, Abstracts & PDFs
arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
arXiv Scraper β Scientific Papers, Abstracts & PDFs
Search arXiv.org β 2M+ open-access scientific papers in physics, CS, math, biology, economics and more β via the official arXiv API.
Built for AI/ML research, literature reviews, RAG datasets, and research analytics. Keyless, fast and reliable β no proxy or browser needed.
What you get
Per paper:
- title, arxiv_id
- authors, author_count
- abstract (full text)
- primary_category, categories
- published, updated
- doi, journal_ref, comment
- pdf_url, abstract_url
- scraped_at
Why this Actor
| arXiv Scraper | Manual search | Raw arXiv API | |
|---|---|---|---|
| Clean flat JSON output | Yes | β | Atom XML to parse |
| Search + filters + paging | Yes | Slow | DIY |
| PDF + abstract links | Yes | Manual | Yes |
| Pay per result | Yes | β | β |
Input
Use the simple fields, or a raw searchQuery for full arXiv syntax.
| Field | Type | Description |
|---|---|---|
allFields | string | Keyword across title/abstract/authors |
title | string | Title contains |
author | string | Author name |
abstract | string | Abstract contains |
category | string | arXiv category (e.g. cs.LG, cs.CL, cs.AI) |
searchQuery | string | Advanced raw query (overrides the above) |
sortBy | string | Relevance / Newest / Recently updated |
maxResults | integer | Max papers to return |
Example: newest LLM papers
{"allFields":"large language models","sortBy":"newest","maxResults":100}
Example: a category, advanced syntax
{"searchQuery":"cat:cs.CL AND abs:retrieval augmented","sortBy":"newest","maxResults":200}
Sample output
{"arxiv_id":"2605.30351v1","title":"VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Video","authors":["Hidir Yesiltepe","Jiazhen Hu"],"primary_category":"cs.CV","categories":["cs.CV","cs.AI"],"published":"2026-05-28T17:59:57Z","abstract":"Long-rollout causal video diffusion...","pdf_url":"https://arxiv.org/pdf/2605.30351v1","abstract_url":"https://arxiv.org/abs/2605.30351v1"}
Use cases
- AI/ML research β track the latest papers in a field or category
- RAG / LLM datasets β build corpora of abstracts + PDF links by topic
- Literature reviews β gather and rank relevant papers fast
- Research analytics β analyse output by category, author and time
Pricing
Pay-per-result. You are charged only for the papers returned β empty runs cost nothing.
Notes & legal
- Uses the official arXiv API. Please respect arXiv's API terms and rate limits (the Actor waits between requests).
- Use data only for lawful purposes.
Related actors
More scrapers from the same author:
- OpenAlex Scraper β academic papers & citations
- PubMed Scraper β biomedical literature & citations
- Reddit Archive Scraper β years of historical posts & comments
