Pricing
from $1.00 / 1,000 results
Go to Apify Store
arXiv Paper Scraper
Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
3
Total users
0
Monthly active users
3 months ago
Last modified
Categories
Share
Scrapes academic papers from arXiv using the public Atom API.
Features
- Search mode โ free-text search across all arXiv papers
- Category mode โ browse papers by arXiv category (e.g.
cs.AI,math.CO,physics.optics) - Configurable sorting (by relevance, last updated, or submission date)
- Pagination with polite rate limiting (3s between requests)
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | search | search or category |
query | string | machine learning | Search query (for search mode) |
category | string | cs.AI | arXiv category code (for category mode) |
max_items | integer | 10 | Maximum papers to scrape (1โ1000) |
sort_by | string | relevance | relevance, lastUpdatedDate, or submittedDate |
sort_order | string | descending | ascending or descending |
Output
Each result contains:
arxiv_idโ arXiv paper ID (e.g.2301.07041)titleโ paper titlesummaryโ abstractauthorsโ list of author namescategoriesโ list of arXiv categoriesprimary_categoryโ primary categorypublishedโ publication date (ISO 8601)updatedโ last updated date (ISO 8601)pdf_urlโ direct PDF linkabs_urlโ abstract page linkcommentโ author comment (optional)journal_refโ journal reference (optional)doiโ DOI (optional)
Example Input
{"mode":"search","query":"transformer architecture","max_items":20,"sort_by":"submittedDate","sort_order":"descending"}
Notes
- The arXiv API has a rate limit of ~1 request per 3 seconds. The scraper respects this.
- Maximum 100 results per API request; pagination is handled automatically.
- arXiv categories list: https://arxiv.org/category_taxonomy
