Pricing
from $1.00 / 1,000 results
arXiv Search & Paper Scraper
Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
22 days ago
Last modified
Categories
Share
arXiv Search & Paper Scraper ๐
Search arXiv and get clean, structured JSON for every paper โ title, authors, abstract, categories, DOI, journal reference, PDF link, and dates. The arXiv API returns awkward Atom XML; this actor does the parsing for you and hands back tidy records ready for analysis, datasets, citation management, or feeding papers to an LLM.
Why use it
- ๐ Flexible search โ by keywords, author, arXiv category, or title
- ๐ฅ Authors as a clean list โ not a blob of XML
- ๐ท๏ธ Categories split out โ primary category plus all cross-listed ones
- ๐ Direct PDF + abstract links โ and DOI / journal reference when available
- ๐ Parsed dates โ published and last-updated
- ๐งน Normalized text โ abstracts cleaned of the API's messy whitespace
- โ๏ธ Sort by relevance, last updated, or submission date
Use cases
- Literature reviews & research โ pull every recent paper in a field
- Building datasets โ assemble structured corpora of papers and abstracts
- LLM / RAG pipelines โ feed clean abstracts and metadata to models
- Trend monitoring โ track new submissions in a category over time
- Citation & reference tooling โ grab DOIs and journal refs at scale
Input
| Field | Description |
|---|---|
| Search query | Free-text keywords across all fields. |
| Author | Restrict to an author (phrase match). |
| Category | arXiv code, e.g. cs.LG, cs.CL, stat.ML. |
| Title contains | Restrict by title phrase. |
| Sort by / order | Relevance, last updated, or submitted; asc/desc. |
| Maximum papers | How many to return. |
Output
{"arxivId":"1706.03762v7","version":7,"title":"Attention Is All You Need","summary":"The dominant sequence transduction models are based on...","authors":["Ashish Vaswani","Noam Shazeer","Niki Parmar"],"authorCount":3,"primaryCategory":"cs.CL","categories":["cs.CL","cs.LG"],"published":"2017-06-12T17:57:34Z","updated":"2023-08-02T00:41:18Z","doi":"10.5555/3295222.3295349","journalRef":"NeurIPS 2017","pdfUrl":"http://arxiv.org/pdf/1706.03762v7","absUrl":"http://arxiv.org/abs/1706.03762v7"}
Export to JSON, CSV, or Excel, or pull via the Apify API. Connect to Sheets, Notion, Slack, Zapier, or Make.
Notes
- Uses the official public arXiv API. Independent tool, not affiliated with arXiv or Cornell University.
- Please be considerate with large jobs; the actor paces requests to respect arXiv's API guidelines.
- arXiv category reference: see arxiv.org/category_taxonomy for the full list of codes.
