VOOZH about

URL: https://apify.com/automly/arxiv-paper-scraper

โ‡ฑ arXiv Paper & Author Scraper ยท Apify


๐Ÿ‘ arXiv Paper & Author Scraper avatar

arXiv Paper & Author Scraper

Under maintenance

Pricing

Pay per usage

Go to Apify Store

arXiv Paper & Author Scraper

Under maintenance

Extract academic papers, abstracts, and author details from arXiv using the official API. Ideal for research monitoring, literature reviews, and building academic datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Automly

Automly

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

Extract academic papers, abstracts, and author details from arXiv using the official API. This actor is perfect for research monitoring, systematic literature reviews, building academic datasets, and feeding RAG pipelines with the latest scientific publications.

Why use this actor?

  • Official API reliability โ€” Uses the arXiv export API for stable, structured data without scraping complexity.
  • Research monitoring โ€” Track new papers in specific fields or by keyword.
  • Literature reviews โ€” Collect abstracts, authors, and categories for systematic analysis.
  • Academic lead generation โ€” Build lists of researchers and their affiliations by topic.
  • RAG & AI pipelines โ€” Feed paper abstracts and metadata into vector databases for semantic search.

Features

  • Search papers by free-text query or arXiv category codes
  • Filter by date range (last week, last month, last year, or custom range)
  • Sort by relevance, submission date, or last updated date
  • Extract full abstracts and author lists with affiliations
  • Output authors as separate records for easy analysis
  • Respects arXiv polite usage policy with built-in rate limiting

Input

FieldTypeDefaultDescription
searchQuerystringโ€”arXiv search query, e.g. machine learning or cat:cs.AI
categoriesarrayโ€”List of arXiv category codes, e.g. ["cs.AI", "cs.LG"]
dateRangestringโ€”lastWeek, lastMonth, lastYear, or YYYY-MM-DD TO YYYY-MM-DD
maxResultsinteger100Maximum papers to return (1โ€“500)
extractAuthorsbooleantrueInclude author records as separate rows
extractAbstractbooleantrueInclude paper abstracts
sortBystringrelevancerelevance, lastUpdatedDate, or submittedDate
sortOrderstringdescendingascending or descending

Example input

{
"searchQuery":"large language models",
"categories":["cs.CL","cs.AI"],
"dateRange":"lastMonth",
"maxResults":50,
"extractAuthors":true,
"sortBy":"submittedDate",
"sortOrder":"descending"
}

Output

Each record includes a type field to distinguish entities.

Paper

FieldTypeDescription
typestringpaper
arxivIdstringarXiv identifier
urlstringarXiv abstract page URL
pdfUrlstringDirect PDF URL
titlestringPaper title
abstractstringPaper abstract
publishedAtstringISO 8601 submission date
updatedAtstringISO 8601 last update date
authorsarrayList of {name, affiliation} objects
categoriesarrayarXiv category codes
primaryCategorystringPrimary arXiv category

Author

FieldTypeDescription
typestringauthor
arxivIdstringAssociated paper identifier
paperTitlestringAssociated paper title
namestringAuthor name
affiliationstringAuthor affiliation

Limits and caveats

  • arXiv API returns up to 100 results per request; the actor paginates automatically.
  • A 3-second delay is enforced between requests to respect arXiv's polite usage policy.
  • Only publicly available papers are returned.
  • Author affiliations are only available when provided by the submitter.

Pricing

This actor uses Pay Per Event pricing. You are charged only for successfully extracted data.

EventPriceDescription
Paper scraped$0.003Each paper successfully extracted
Author scraped$0.001Each author record successfully extracted

Tiered discounts apply based on your Apify subscription level. A small actor-start fee may also apply.

FAQ

Do I need an arXiv account? No. The arXiv API is completely open and requires no authentication.

Can I download the full PDF? The actor returns direct PDF URLs in the pdfUrl field. You can download them separately.

What categories are available? arXiv uses codes like cs.AI (Artificial Intelligence), cs.LG (Machine Learning), cs.CL (Computation and Language), physics.gen-ph, math.ST, etc. See the full list at arxiv.org.

How recent is the data? Data reflects the current arXiv index at the time of the run. New papers are typically available within minutes of submission.

You might also like

arXiv Search Scraper ๐Ÿ“š

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. ๐ŸŽ“๐Ÿ“š

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

10

ArXiv Paper Scraper

sheshinmcfly/arxiv-paper-scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Arxiv Keyword Spider

getdataforme/arxiv-keyword-spider

Arxiv Keyword Spider efficiently scrapes arXiv.org for research papers using keywords, delivering comprehensive metadata like titles, authors, abstracts, and categories. Perfect for academic research, market analysis, and trend monitoring....