VOOZH about

URL: https://apify.com/jungle_synthesizer/arxiv-scraper

โ‡ฑ arXiv Scraper ยท Apify


Pricing

Pay per event

Go to Apify Store

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Returns titles, authors, abstracts, categories, and PDF links.

Pricing

Pay per event

Rating

0.0

(0)

Developer

๐Ÿ‘ BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 days ago

Last modified

Share

Export preprints and papers from arXiv.org โ€” the leading open-access repository for 2.5 million+ scientific papers across physics, mathematics, computer science, biology, economics, and quantitative finance.

This actor queries the official ArXiv Atom API (export.arxiv.org/api/query) โ€” the method ArXiv officially supports for programmatic data access. No scraping, no JavaScript rendering, no account required.

What you get

Each result includes:

  • arxiv_id โ€” the canonical short ID (e.g. 2301.12345)
  • abs_url โ€” link to the abstract page
  • pdf_url โ€” direct PDF download link
  • title โ€” full paper title
  • abstract โ€” complete abstract / summary
  • authors โ€” comma-separated author names
  • primary_category โ€” primary subject category (e.g. cs.AI)
  • categories โ€” all subject categories, comma-separated
  • published โ€” original submission date (ISO 8601)
  • updated โ€” date of the latest version
  • comment โ€” author notes (page count, conference, etc.) if available

Search query syntax

The searchQuery field supports ArXiv's full query language:

PatternExampleMeaning
Plain keywordmachine learningFull-text search
Titleti:attentionPapers with "attention" in the title
Authorau:HintonPapers by Hinton
Abstractabs:transformerPapers with "transformer" in abstract
Categorycat:cs.AIPapers in the cs.AI category
Booleancat:cs.LG AND ti:diffusionCategory AND title filter
Date rangesubmittedDate:[202301010000 TO 202312312359]Papers from 2023

See the ArXiv query language reference for the full syntax.

Common arXiv categories

CategoryField
cs.AIArtificial Intelligence
cs.LGMachine Learning
cs.CLComputation and Language (NLP)
cs.CVComputer Vision
physics.hep-thHigh Energy Physics Theory
math.COCombinatorics
q-bio.NCNeurons and Cognition
econ.GNGeneral Economics

Input parameters

ParameterTypeDefaultDescription
searchQuerystringrequiredArXiv query expression
maxItemsinteger50Maximum number of papers to return
sortBystringsubmittedDateSort field: relevance, lastUpdatedDate, submittedDate
sortOrderstringdescendingascending or descending

Usage examples

Fetch the 100 most recent cs.AI papers:

{
"searchQuery":"cat:cs.AI",
"maxItems":100,
"sortBy":"submittedDate",
"sortOrder":"descending"
}

Find papers by a specific author:

{
"searchQuery":"au:LeCun",
"maxItems":50,
"sortBy":"relevance"
}

Search for diffusion model papers from 2024:

{
"searchQuery":"ti:diffusion AND submittedDate:[202401010000 TO 202412312359]",
"maxItems":200
}

Technical notes

  • Uses the ArXiv Atom API โ€” ArXiv's official programmatic interface
  • Pagination is handled automatically; set maxItems to any number
  • Rate-limited to ~1 request/second per ArXiv usage guidelines
  • No authentication required
  • Results span all of arXiv's subject areas (2.5M+ papers total)

You might also like

arXiv Preprint Scraper

parseforge/arxiv-scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

17

5.0

ArXiv Preprint Paper Search

ryanclinton/arxiv-paper-search

Search and extract preprint research papers from the ArXiv open-access repository. Query over 2.4 million academic papers across physics, mathematics, computer science, biology, economics, and more with structured JSON output, no API key required.

16

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

arXiv Papers Scraper

crawlerbros/arxiv-papers-scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.

๐Ÿ“„ ArXiv Scraper โ€” Preprints & Research Data

nexgendata/arxiv-scraper

Extract papers from ArXiv โ€” titles, abstracts, authors, categories & PDF links. Monitor new AI, physics, math & CS research. Build tracking & literature review tools. Pay per paper.

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

arXiv Scraper - Scientific Papers, Abstracts & PDFs

benthepythondev/arxiv-scraper

arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.