arXiv Papers Scraper

Pricing

from $1.00 / 1,000 results

arXiv Papers Scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

What this actor does

Queries the arXiv API (https://export.arxiv.org/api/query) by keyword, author, and/or category
Parses the Atom XML response into one structured JSON record per paper
Filters by date range, DOI presence, abstract length, abstract keyword
Sorts by relevance, submitted-date, or last-updated-date
Walks paginated results until maxItems is reached
Respects arXiv's 1-request-per-3-seconds rate limit

Output per paper

arxivId — e.g. 2401.12345
title, abstract, abstractWordCount
authors[], authorCount, affiliations[]
categories[], primaryCategory — e.g. cs.LG
submittedAt, updatedAt — ISO-8601 UTC
doi — when published in a journal
journalRef — full citation
comment — author's note (e.g. "15 pages, 5 figures")
pdfUrl — direct PDF download link
htmlUrl — abstract page on arXiv.org
recordType: "paper", scrapedAt

Empty fields are omitted (no nulls).

Input

Field	Type	Default	Description
`searchQuery`	string	`"large language models"`	Free-text query against title + abstract + authors
`categories`	array	`[]`	arXiv subject codes (e.g. `cs.LG`, `stat.ML`). 50+ choices in the dropdown
`authorContains`	string	–	Filter by author name substring
`sortBy`	enum	`submittedDate`	`relevance` / `submittedDate` / `lastUpdatedDate`
`sortOrder`	enum	`descending`	`descending` (newest first) / `ascending`
`dateRangeFrom`	string	–	Drop papers submitted before this ISO date
`dateRangeTo`	string	–	Drop papers submitted after this ISO date
`maxItems`	int	`50`	Hard cap on emitted papers (1–5000)
`includeDoiOnly`	bool	`false`	Drop papers without a DOI (typically pre-publication)
`minAbstractLength`	int	–	Drop papers with abstracts shorter than N characters
`abstractContains`	string	–	Only emit papers whose abstract contains this substring

Example: latest LLM papers

{
"searchQuery":"large language models",
"categories":["cs.CL","cs.LG"],
"sortBy":"submittedDate",
"maxItems":100
}

Example: papers by a specific author

{
"authorContains":"Yann LeCun",
"sortBy":"submittedDate",
"maxItems":50
}

Example: published papers (DOI required)

{
"searchQuery":"transformer",
"categories":["cs.LG"],
"includeDoiOnly":true,
"minAbstractLength":200,
"dateRangeFrom":"2024-01-01"
}

Example: niche query

{
"searchQuery":"diffusion model",
"categories":["cs.CV"],
"abstractContains":"image generation",
"sortBy":"relevance",
"maxItems":25
}

Use cases

AI/ML research tracking — daily run on cs.LG + cs.AI to surface new methods
Literature review automation — feed every paper matching your query into your RAG index
Author following — watch a specific researcher's new submissions
Trend analysis — count papers per topic over time to chart research interest
Citation database — pair with Crossref/DOI lookup for full bibliographic records
Academic content marketing — find papers citing techniques your tool implements

FAQ

Does it require a login or cookies? No. arXiv's API is fully public.

Is a proxy needed? No. arXiv accepts requests from any IP. The actor honors arXiv's 3-seconds-between-requests rate limit by default.

How fresh is the data? Real-time. arXiv typically posts new papers within hours of submission.

Can I get the full PDF? The actor returns pdfUrl — a direct link to the PDF. Download it with any HTTP client.

Why is doi missing on some papers? arXiv preprints don't always have a DOI assigned at the time of upload. Set includeDoiOnly=true to filter to peer-reviewed or journal-published versions only.

What's the difference between searchQuery and abstractContains? searchQuery is sent to arXiv's server-side search (ranks by relevance). abstractContains is a client-side substring filter applied AFTER fetching. Use searchQuery for relevance, abstractContains for narrow keyword filtering on top of that.

Why limit to 5000 items? arXiv's API allows up to 30k results per query but pagination beyond a few thousand becomes very slow due to the 3-second rate limit. For larger crawls, run multiple actor runs with different dateRangeFrom/dateRangeTo windows.

Can I scrape the PDF text content? Not directly — this actor returns metadata only. Pair it with a downstream PDF-extraction actor if you need full-text.

How are categories specified? Use arXiv's official codes (e.g. cs.LG for ML, stat.ML for stats ML, cs.CL for NLP, q-bio.QM for quantitative biology). The dropdown lists 50+ common codes; the full taxonomy is at arxiv.org/category_taxonomy.

👁 arXiv Research Paper Scraper avatar

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

👁 User avatar

Crawler Bros

Arxiv Papers Scraper

chimerical_quicklime/arxiv-papers-scraper

Search arXiv preprints via the public Atom API. Returns title, authors, abstract, categories, published date, updated date, DOI, journal reference, and PDF link. Filter by category, author, or keyword.

👁 User avatar

Khrystyna Skotte

👁 ArXiv Paper Search avatar

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

👁 User avatar

Monkey Coder

👁 arXiv Paper Scraper avatar

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

👁 User avatar

Daniel

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

👁 User avatar

cloud9

👁 arXiv Scraper: Papers, Authors, Categories & Search avatar

arXiv Scraper: Papers, Authors, Categories & Search

perconey/arxiv-scraper

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

👁 User avatar

Perconey

👁 arXiv Scraper avatar

arXiv Scraper

dami_studio/arxiv-scraper

Search arXiv via the official API and get clean, structured paper metadata: title, abstract, authors, categories, DOI, dates, and abstract + PDF links. No key, no login, no anti-bot. Uses arXiv search syntax (all:, cat:, ti:, au:).

👁 User avatar

Dami's Studio

5.0

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

👁 User avatar

lulz bot

👁 ArXiv Paper Scraper avatar

ArXiv Paper Scraper

sheshinmcfly/arxiv-paper-scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

👁 User avatar

Sheshinmcfly

👁 arXiv Metadata Collector— Metadata, PDF, Authors & Abstract avatar

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

scrapepilot/arxiv-metadata-collector---metadata-pdf-authors-abstract

Scrape arXiv research papers with metadata including title, authors, abstract, PDF links, DOI, and categories. Supports keyword search, proxy integration, and structured dataset output for AI, ML, and academic research use

👁 User avatar

Scrape Pilot

URL: https://apify.com/crawlerbros/arxiv-papers-scraper

⇱ arXiv Papers Scraper · Apify

arXiv Papers Scraper

What this actor does

Output per paper

Input

Example: latest LLM papers

Example: papers by a specific author

Example: published papers (DOI required)

Example: niche query

Use cases

FAQ

You might also like

arXiv Research Paper Scraper

Arxiv Papers Scraper

ArXiv Paper Search

arXiv Paper Scraper

arXiv Paper Scraper

arXiv Scraper: Papers, Authors, Categories & Search

arXiv Scraper

arXiv Paper Scraper

ArXiv Paper Scraper

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract