VOOZH about

URL: https://apify.com/crawlerbros/semanticscholar-scraper

โ‡ฑ Semantic Scholar Scraper ยท Apify


Pricing

from $3.00 / 1,000 results

Go to Apify Store

Semantic Scholar Scraper

Scrape Semantic Scholar with 200M+ academic papers and authors with full citation graph. Search, fetch by paper/author ID, get citations / references / recommendations, with abstracts, TLDRs, fields-of-study, open-access PDFs, h-index, affiliations, and more

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a month ago

Last modified

Share

Scrape Semantic Scholar โ€” Allen Institute for AI's open catalog of 200M+ academic papers and authors with a full citation graph โ€” directly via the official Semantic Scholar Graph API.

What you get

For every paper:

  • paperId, corpusId, externalIds (DOI, arXiv, MAG, PMID, ACL, DBLP)
  • title, abstract, tldr (AI-generated summary)
  • year, publicationDate, venue, publicationVenue, journal
  • authors โ€” list of {authorId, name}, plus primaryAuthor
  • fieldsOfStudy, s2FieldsOfStudy (with source attribution)
  • publicationTypes (Review, JournalArticle, Conference, โ€ฆ)
  • referenceCount, citationCount, influentialCitationCount
  • isOpenAccess, openAccessPdf ({url, status, license})
  • semanticScholarUrl

For every author:

  • authorId, name, aliases, affiliations, homepage
  • paperCount, citationCount, hIndex
  • externalIds (ORCID, DBLP)
  • semanticScholarUrl

For citation/reference relations:

  • The full paper record of the citing/cited paper
  • citationContexts (text snippets of where it was cited)
  • citationIntents (background, methodology, result)
  • isInfluentialCitation

Modes

ModeWhat it does
searchPaperRelevance-ranked paper search via /paper/search. Best for "find me the top N papers about X".
searchPaperBulkBulk paper search via /paper/search/bulk โ€” 1000 results per page, full-corpus pagination. Best for "give me everything about X".
byPaperLook up papers by ID. Accepts the 40-char Semantic Scholar SHA, plus prefixed external IDs: DOI:, ARXIV:, MAG:, PMID:, PMCID:, ACL:, DBLP:. Bare DOIs / arXiv IDs are auto-prefixed.
byPaperCitationsAll papers that cite the given paper (with citation contexts and intents).
byPaperReferencesAll papers cited by the given paper.
searchAuthorSearch authors by name.
byAuthorLook up authors by Semantic Scholar author ID.
byAuthorPapersAll papers authored by the given Semantic Scholar author ID.
recommendationsGet related/similar papers via /recommendations/v1/papers/forpaper/{id}.
byUrlAuto-route from Semantic Scholar / DOI / arXiv URLs.

Filters

Search modes accept:

  • year โ€” single year (2023), open range (2018-, -2010), or closed range (2015-2020)
  • fieldsOfStudy โ€” multi-select: Computer Science, Medicine, Chemistry, Biology, โ€ฆ
  • publicationTypes โ€” multi-select: Review, JournalArticle, Conference, โ€ฆ
  • venues โ€” free-text list (e.g., Nature, NeurIPS)
  • openAccessOnly โ€” drop papers without an open-access PDF
  • minCitationCount โ€” minimum citation count
  • sort (bulk search only) โ€” relevance, citationCount:desc/asc, publicationDate:desc/asc

API key (optional)

The Semantic Scholar Graph API is public and free. An API key is not required, but raises rate limits 10x. Free signup: https://www.semanticscholar.org/product/api#api-key-form.

Without a key the actor enforces a polite ~1.5s delay between requests so a single run stays under the 100-requests-per-5-minutes budget.

Example inputs

Search the literature on attention mechanisms

{
"mode":"searchPaper",
"searchQuery":"transformer attention",
"fieldsOfStudy":["Computer Science"],
"year":"2017-",
"minCitationCount":50,
"maxItems":100
}

Fetch the "Attention Is All You Need" paper

{
"mode":"byPaper",
"paperIds":["ARXIV:1706.03762"],
"includeReferencesOnPaper":true,
"maxItems":1
}

All citations of a foundational paper

{
"mode":"byPaperCitations",
"paperIds":["DOI:10.1145/3065386"],
"maxItems":500
}

All papers by Geoffrey Hinton

{
"mode":"byAuthorPapers",
"authorIds":["1741101"],
"maxItems":200
}

Recommendations for a paper

{
"mode":"recommendations",
"paperIds":["ARXIV:1706.03762"],
"maxItems":50
}

FAQ

How do I find a paper's Semantic Scholar ID? Use the URL on semanticscholar.org โ€” the 40-char hex at the end is the ID. Or use a DOI / arXiv ID with the DOI: / ARXIV: prefix. The byUrl mode accepts any of these URL forms directly.

Why does my run say "0 records emitted"? Either the search query had no matches, or the filter combination was too narrow (e.g., minCitationCount: 100000 will drop almost everything). Loosen filters or check the status message.

Are abstracts always available? No. Older papers and some publishers don't share abstracts via the API. The actor omits the abstract field when missing rather than returning null.

What happens on rate-limit? The actor honours the Retry-After header on 429 responses and retries with exponential backoff. With a key you almost never hit the limit; without a key, large jobs slow to a crawl after 100 requests in any 5-min window.

Can I get reference / citation counts without fetching all the papers? Yes โ€” searchPaper and byPaper already return citationCount, referenceCount, and influentialCitationCount in the default field set.

Are the open-access PDF URLs hotlink-blocked? No. They point at the original publisher / arXiv / preprint server and resolve from a clean shell.

Limitations

  • The recommendations endpoint returns up to 100 recommendations per source paper.
  • The Graph API limits each call to 1000 records max; bulk search can paginate beyond 1000.
  • tldr (AI summary) is only generated for a subset of papers.

Source

Data is fetched from the official Semantic Scholar API: https://api.semanticscholar.org/graph/v1. The Allen Institute for AI publishes the API for academic and non-commercial use. See the API terms.

You might also like

Semantic Scholar Scraper

solidcode/semanticscholar-scraper

[๐Ÿ’ฐ $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.

Semantic Scholar Scraper - Papers, Authors, Citations

gio21/semantic-scholar-scraper

Search and fetch academic papers, authors, citations, and references via the Semantic Scholar Graph API.

Semantic Scholar Search Scraper

powerai/semantic-scholar-search-scraper

Scrape academic papers from Semantic Scholar by keyword search, with automatic pagination and comprehensive research data extraction.

๐ŸŽ“ Google Scholar Scraper โ€” Papers & Citations

nexgendata/google-scholar-scraper

Scrape Google Scholar for papers, citations, authors & h-index data. Semantic Scholar, Scopus & Web of Science alternative for literature reviews, citation analysis, author clustering and research analytics. Pay per paper.

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.

Semantic Scholar Paper Search

ryanclinton/semantic-scholar-search

Search and extract academic research papers from Semantic Scholar's database of over 200 million publications.

Related articles

Top 5 Google Scholar APIs to extract article data
Read more
How to scrape Google Scholar ๐ŸŽ“
Read more