VOOZH about

URL: https://apify.com/plantane/arxiv-scraper

โ‡ฑ arXiv Paper Scraper ยท Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Daniel

Daniel

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

3 months ago

Last modified

Share

Scrapes academic papers from arXiv using the public Atom API.

Features

  • Search mode โ€” free-text search across all arXiv papers
  • Category mode โ€” browse papers by arXiv category (e.g. cs.AI, math.CO, physics.optics)
  • Configurable sorting (by relevance, last updated, or submission date)
  • Pagination with polite rate limiting (3s between requests)

Input

FieldTypeDefaultDescription
modestringsearchsearch or category
querystringmachine learningSearch query (for search mode)
categorystringcs.AIarXiv category code (for category mode)
max_itemsinteger10Maximum papers to scrape (1โ€“1000)
sort_bystringrelevancerelevance, lastUpdatedDate, or submittedDate
sort_orderstringdescendingascending or descending

Output

Each result contains:

  • arxiv_id โ€” arXiv paper ID (e.g. 2301.07041)
  • title โ€” paper title
  • summary โ€” abstract
  • authors โ€” list of author names
  • categories โ€” list of arXiv categories
  • primary_category โ€” primary category
  • published โ€” publication date (ISO 8601)
  • updated โ€” last updated date (ISO 8601)
  • pdf_url โ€” direct PDF link
  • abs_url โ€” abstract page link
  • comment โ€” author comment (optional)
  • journal_ref โ€” journal reference (optional)
  • doi โ€” DOI (optional)

Example Input

{
"mode":"search",
"query":"transformer architecture",
"max_items":20,
"sort_by":"submittedDate",
"sort_order":"descending"
}

Notes

  • The arXiv API has a rate limit of ~1 request per 3 seconds. The scraper respects this.
  • Maximum 100 results per API request; pagination is handled automatically.
  • arXiv categories list: https://arxiv.org/category_taxonomy

You might also like

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

10

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

arXiv Papers Scraper

crawlerbros/arxiv-papers-scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.