VOOZH about

URL: https://apify.com/sheshinmcfly/arxiv-paper-scraper

⇱ ArXiv Paper Scraper - Scientific Research Data Β· Apify


Pricing

from $2.00 / 1,000 results

Go to Apify Store

ArXiv Paper Scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Sheshinmcfly

Sheshinmcfly

Maintained by Community

Actor stats

1

Bookmarked

4

Total users

1

Monthly active users

9 days ago

Last modified

Share

Search and extract scientific papers from ArXiv.org β€” the largest open-access repository of preprints in physics, mathematics, computer science, AI, and more.

Returns full metadata including title, authors, abstract, categories, submission date, and PDF link. Perfect for AI research pipelines, RAG systems, and academic trend monitoring.


What data does it extract?

FieldDescriptionExample
arxivIdArXiv paper ID"2604.18584"
titleFull paper title"MathNet: a Global Multimodal Benchmark..."
authorsList of authors["Shaden Alshammari", "Kevin Wen"]
abstractFull abstract text"Mathematical problem solving remains..."
categoriesArXiv subject tags["cs.AI", "cs.LG", "cs.IR"]
primaryCategoryPrimary category"cs.AI"
submittedDateSubmission date"20 April, 2026"
commentsAuthor comments"ICLR 2026; 30 pages"
journalRefJournal reference"Proceedings of ICLR, 2026"
pdfUrlDirect PDF link"https://arxiv.org/pdf/2604.18584"
urlArXiv abstract page"https://arxiv.org/abs/2604.18584"
querySearch query used"large language models"
extractedAtExtraction timestamp"2026-04-21T12:00:00Z"

Use cases

  • RAG pipelines: Feed domain-specific papers into retrieval-augmented AI systems
  • AI research monitoring: Track the latest publications in LLMs, computer vision, NLP
  • Academic trend analysis: Identify hot topics and emerging research areas
  • Literature review automation: Collect papers for a specific topic at scale
  • LLM fine-tuning data: High-quality scientific text for model training
  • Competitive intelligence: Monitor what research competitors are publishing

How to use

  1. Open the actor and configure:
    • Search queries: One or more search terms (e.g. "diffusion models", "reinforcement learning")
    • Search field: All fields, title only, abstract only, or author
    • Sort by: Newest first or by relevance
    • Max results: Number of papers per query
  2. Click Start
  3. Download results as JSON, CSV, or Excel

Example output (JSON)

{
"arxivId":"2604.18584",
"title":"MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval",
"authors":["Shaden Alshammari","Kevin Wen","Antonio Torralba"],
"abstract":"Mathematical problem solving remains a challenging test of reasoning...",
"categories":["cs.AI","cs.DL","cs.IR","cs.LG"],
"primaryCategory":"cs.AI",
"submittedDate":"20 April, 2026",
"comments":"ICLR 2026; Website: http://mathnet.mit.edu",
"journalRef":"Proceedings of ICLR, 2026",
"pdfUrl":"https://arxiv.org/pdf/2604.18584",
"url":"https://arxiv.org/abs/2604.18584",
"query":"large language models",
"extractedAt":"2026-04-21T12:00:00.000Z"
}

Pricing

This actor charges $0.002 USD per paper extracted. Extracting 100 papers costs approximately $0.20 USD.


Keywords

arxiv scraper, scientific paper extractor, research paper scraper, arxiv API, AI paper scraper, academic data extractor, preprint scraper, NLP research data, LLM training data, arxiv search scraper


Legal Disclaimer

This actor extracts publicly available open-access data only from ArXiv.org, in compliance with Chilean Law 19.628 on the Protection of Private Life (Ley 19.628 sobre ProtecciΓ³n de la Vida Privada).

ArXiv is an open-access repository operated by Cornell University. All papers and metadata extracted are freely and publicly accessible without authentication.

What this actor does NOT collect:

  • Names or personal data of any private individuals
  • User accounts, submissions portals, or private information
  • Any data not freely visible to anonymous visitors

What this actor collects:

  • Paper titles, abstracts, and author names (public academic data)
  • Subject categories and submission dates
  • Public URLs and PDF links

Users are solely responsible for ensuring their use of this data complies with applicable laws and ArXiv's terms of use.

Other actors you may like

You might also like

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

10

Arxiv Paper Intelligence

viralanalyzer/arxiv-paper-intelligence

Search and extract ArXiv papers, abstracts, authors, and citations. Track research trends across any scientific field. AI-powered analysis.

8

5.0

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

arXiv Scraper - Scientific Papers, Abstracts & PDFs

benthepythondev/arxiv-scraper

arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.