VOOZH about

URL: https://apify.com/artificially/arxiv-scraper

⇱ arXiv Scraper [DEPRECATED] Β· Apify


πŸ‘ arXiv Scraper avatar

arXiv Scraper

Deprecated

Pricing

from $0.70 / 1,000 results

Go to Apify Store

arXiv Scraper

Deprecated

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Pricing

from $0.70 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Artificially

Artificially

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

5 months ago

Last modified

Share

arXiv Papers Scraper - Enhanced

Search and extract academic papers from arXiv.org with citation analysis, author profiles, and impact metrics via Semantic Scholar integration.

Features

Core Search

  • Full-text Search: Search across all arXiv papers
  • Category Filtering: Filter by arXiv category (cs.AI, physics, math, etc.)
  • Sorting Options: Sort by relevance, submission date, or update date
  • Complete Metadata: Title, authors, abstract, categories, dates

Citation Analysis (NEW)

  • Citation Counts: Total citations from Semantic Scholar
  • Influential Citations: Citations that significantly impacted the field
  • Citation Velocity: Recent citation momentum
  • Citations Per Year: Historical citation distribution
  • Highly Influential Flag: Identify breakthrough papers

Author Profiles (NEW)

  • h-Index: Author's impact metric
  • Total Citations: Lifetime citation count
  • Paper Count: Publication volume
  • Affiliations: Current institutional affiliations
  • Semantic Scholar Links: Direct profile links

Related Content (NEW)

  • References: Papers cited by each result
  • Related Papers: AI-recommended similar papers
  • Venue Information: Publication venue if applicable
  • Fields of Study: Semantic Scholar topic classification

Impact Scoring (NEW)

  • Calculated Impact Score: Combined metric considering citations, author h-index, and momentum
  • Results sorted by impact: Most influential papers first

Use Cases

  • Build research paper datasets with citation metrics
  • Identify high-impact papers in your field
  • Find influential authors and their work
  • Track citation trends over time
  • Literature review with impact analysis
  • Research team evaluation

Input

FieldTypeRequiredDefaultDescription
searchQuerystringYes-Search terms
categorystringNo-arXiv category filter
maxPapersnumberNo100Maximum papers
sortBystringNosubmittedDateSort order
includeCitationsbooleanNotrueFetch citation metrics
includeAuthorProfilesbooleanNotrueFetch author h-index and stats
includeReferencesbooleanNofalseFetch paper bibliography
maxReferencesnumberNo10References per paper
includeRelatedPapersbooleanNofalseFetch similar papers
maxRelatedPapersnumberNo5Related papers per result

Example Input

{
"searchQuery":"large language models",
"category":"cs.CL",
"maxPapers":50,
"includeCitations":true,
"includeAuthorProfiles":true,
"includeRelatedPapers":true,
"sortBy":"submittedDate"
}

Output

Each paper produces a result with:

{
"arxivId":"2401.12345",
"title":"Advances in Large Language Models: A Survey",
"authors":["John Smith","Jane Doe"],
"authorProfiles":[
{
"name":"John Smith",
"authorId":"12345678",
"hIndex":45,
"citationCount":15000,
"paperCount":120,
"affiliations":["Stanford University"],
"url":"https://www.semanticscholar.org/author/12345678"
}
],
"abstract":"This paper surveys recent advances...",
"categories":["cs.CL","cs.AI"],
"categoryDescriptions":["Computation and Language (NLP)","Artificial Intelligence"],
"citations":{
"totalCitations":1250,
"influentialCitations":89,
"citationVelocity":125.5,
"citationsPerYear":{
"2023":450,
"2024":800
},
"isHighlyInfluential":true
},
"references":[
{
"title":"Attention Is All You Need",
"authors":["Ashish Vaswani"],
"citationCount":75000,
"arxivId":"1706.03762"
}
],
"relatedPapers":[
{
"title":"GPT-4 Technical Report",
"citationCount":5000,
"url":"https://arxiv.org/abs/2303.08774"
}
],
"impactScore":85.3,
"venue":"NeurIPS 2024",
"fieldsOfStudy":["Computer Science","Linguistics"],
"pdfUrl":"https://arxiv.org/pdf/2401.12345.pdf",
"arxivUrl":"https://arxiv.org/abs/2401.12345",
"scrapedAt":"2024-01-20T12:00:00Z"
}

Cost

This actor uses pay-per-result pricing:

Cost TypeAmount
Start fee$0.05 per run
Per paper$0.001

No API key required - Uses arXiv and Semantic Scholar public APIs.

Example Cost Calculation

  • 100 papers: $0.05 + (100 x $0.001) = $0.15
  • 1,000 papers: $0.05 + (1000 x $0.001) = $1.05

Tips

  1. Impact sorting: Results are automatically sorted by calculated impact score

  2. Highly influential papers: Look for isHighlyInfluential: true for breakthrough papers

  3. Author quality: Check author h-index to identify papers from established researchers

  4. Citation velocity: High velocity indicates trending/hot papers

  5. Related papers: Enable includeRelatedPapers for comprehensive literature discovery

Rate Limits

  • arXiv: 3-second delay between requests (handled automatically)
  • Semantic Scholar: 1-second delay (handled automatically)

Support

  • Built by: Artificially
  • Issues: Report bugs or request features via Apify Console

You might also like

arXiv Search Scraper πŸ“š

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. πŸŽ“πŸ“š

arXiv Preprint Scraper

parseforge/arxiv-scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

17

5.0

(1)

πŸ“„ ArXiv Scraper β€” Preprints & Research Data

nexgendata/arxiv-scraper

Extract papers from ArXiv β€” titles, abstracts, authors, categories & PDF links. Monitor new AI, physics, math & CS research. Build tracking & literature review tools. Pay per paper.

ArXiv Paper Scraper

sheshinmcfly/arxiv-paper-scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

arXiv Scraper β€” Search & Export Paper Metadata

devilscrapes/arxiv-papers-scraper

Search arXiv by query, category, or author and export structured paper metadata β€” title, authors, abstract, primary category, DOI, PDF URL, submitted and updated timestamps β€” to JSON or CSV. An arXiv API wrapper that handles pagination, retries, and rate-limit pacing for your pipeline.

arXiv Scraper

dami_studio/arxiv-scraper

Search arXiv via the official API and get clean, structured paper metadata: title, abstract, authors, categories, DOI, dates, and abstract + PDF links. No key, no login, no anti-bot. Uses arXiv search syntax (all:, cat:, ti:, au:).

3

5.0

(1)

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.