VOOZH about

URL: https://apify.com/crawlerbros/openalex-scraper

โ‡ฑ OpenAlex Scraper ยท Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Scrape OpenAlex the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI. Pulls citations, open-access status, abstracts, authorships, journals, topics, and more.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

Scrape OpenAlex โ€” the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI / PMID. Pulls citations, open-access status, abstracts, authorships, journals, topics. HTTP-only via the public api.openalex.org API. No auth, no proxy, no rate-limit drama (100k req/day in the polite pool).

What this actor does

  • Four modes: searchWorks, searchAuthors, byWorkIds, byAuthorIds
  • Universal IDs: OpenAlex (Wโ€ฆ, Aโ€ฆ), DOI, PMID, PMCID, ORCID โ€” all auto-normalized
  • Reconstructs abstracts from OpenAlex's inverted index (zero extra API calls)
  • Filters: publication year range, min citation count, open-access only, work type
  • Sorts: relevance, most cited, newest publication date / year
  • Empty fields are omitted โ€” no nulls reach the dataset

Output per work

  • openalexId, doi, pmid, pmcid, magId โ€” universal IDs
  • title, publicationDate, publicationYear, type, language
  • citedByCount, fwci (field-weighted citation impact), hasFulltext
  • isOa, openAccessOaUrl, openAccessStatus, bestOaUrl
  • venue โ€” {name, issn_l, publisher, type, isOa, license}
  • authorships[] โ€” [{authorId, name, orcid, position, institutions}, ...] (when includeAuthorships=true)
  • primaryAuthor โ€” first author display name (always present scalar)
  • concepts[] โ€” top 10 OpenAlex concept tags (when includeConcepts=true)
  • abstract โ€” reconstructed text (when includeAbstract=true and OpenAlex has it)
  • relevanceScore โ€” search relevance score (search modes)
  • openalexUrl โ€” canonical link
  • recordType: "work", scrapedAt

Output per author

  • openalexId, name, orcid
  • worksCount, citedByCount
  • lastKnownInstitutions[]
  • hIndex, i10Index
  • openalexUrl, recordType: "author", scrapedAt

Input

FieldTypeDefaultDescription
modestringsearchWorkssearchWorks / searchAuthors / byWorkIds / byAuthorIds
searchQuerystringlarge language modelsFor searchWorks / searchAuthors
workIdsarrayโ€“OpenAlex IDs / DOIs / PMIDs / PMCIDs (for byWorkIds)
authorIdsarrayโ€“OpenAlex author IDs / ORCIDs (for byAuthorIds)
publicationYearMinintโ€“Drop works before this year
publicationYearMaxintโ€“Drop works after this year
minCitedByintโ€“Drop works with fewer citations
openAccessOnlyboolfalseOnly emit OA works
workTypestringanyarticle/book/preprint/review/dataset/etc.
sortBystringrelevance_score:descSearch ordering
includeAbstractbooltrueReconstruct abstract from inverted index
includeAuthorshipsbooltrueFull authorship array
includeConceptsbooltrueTop concept tags
userAgentEmailstringapify-actor@noreply.apify.comOpenAlex polite-pool email
maxItemsint50Hard cap (1โ€“10000)

Example: top-cited LLM papers from 2024

{
"mode":"searchWorks",
"searchQuery":"large language models",
"publicationYearMin":2024,
"minCitedBy":50,
"sortBy":"cited_by_count:desc",
"maxItems":100
}

Example: lookup specific papers by DOI

{
"mode":"byWorkIds",
"workIds":[
"10.1145/3442188.3445922",
"https://doi.org/10.48550/arXiv.2310.06825",
"pmid:25524000"
]
}

Example: all works by an author (Geoffrey Hinton)

{
"mode":"byAuthorIds",
"authorIds":["A1969205038"],
"minCitedBy":100,
"maxItems":200
}

Example: open-access ML papers only

{
"mode":"searchWorks",
"searchQuery":"machine learning fairness",
"openAccessOnly":true,
"workType":"article",
"publicationYearMin":2020
}

Use cases

  • Literature reviews โ€” bulk-export every paper matching a topic across all disciplines
  • Citation tracking โ€” find the most-cited works on a topic, or all works citing a specific paper
  • Author intelligence โ€” track an author's publication record, h-index, institutional affiliations
  • Open-access auditing โ€” find OA copies of every paper in a reading list
  • Topic monitoring โ€” schedule recurring runs to catch new papers in your area
  • Cross-database enrichment โ€” feed DOIs from arXiv / PubMed / Crossref โ†’ enrich with OpenAlex citations

FAQ

What's OpenAlex? An open replacement for Microsoft Academic Graph: 250M+ scholarly works, 80M+ authors, free for any use, fully indexed by content+citations. See openalex.org.

Is there a rate limit? Yes โ€” 100k requests/day in the polite pool (anyone with an email in their User-Agent). The actor sets this header automatically.

Why are abstracts sometimes missing? OpenAlex omits abstracts when their license doesn't permit redistribution. The actor returns whatever's available; missing abstracts mean the source publisher doesn't allow it.

How does it differ from arXiv / PubMed? OpenAlex is broader โ€” covers all disciplines, all sources (preprint servers, journals, books, datasets). arXiv only covers preprints in physics/math/CS. PubMed only covers biomedical literature.

What ID formats are accepted? OpenAlex IDs (W123โ€ฆ, A123โ€ฆ), full DOI URLs (https://doi.org/10.1145/...), bare DOIs (10.1145/...), pmid:N, pmcid:N, and ORCIDs (0000-0001-โ€ฆ).

What's fwci? Field-weighted citation impact โ€” a paper's citation count normalized to its field's average. 1.0 = field average, 2.0 = twice field average, etc. Useful for cross-discipline comparison.

Why is concepts capped at 10? OpenAlex assigns dozens of low-confidence concepts per work. We keep the top 10 (already sorted by score) for table display compactness; the full list is in OpenAlex's web UI.

How fresh is the data? Daily โ€” OpenAlex re-indexes nightly from Crossref, PubMed, ORCID, ROR, etc.

You might also like

OpenAlex Scraper

gio21/openalex-scraper

Scrape OpenAlex - the free open catalog of scholarly works (250M+ papers, 100M+ authors, 100K institutions). Search across works, authors, institutions, concepts, journals. Returns title, abstract, authors, citations, DOI, OA status, and more.

OpenAlex Scraper

automation-lab/openalex-scraper

Extract research papers from OpenAlex โ€” titles, authors, citations, institutions, and open access links.

๐Ÿ‘ User avatar

Stas Persiianenko

7

OpenAlex Scraper - Scholarly Works, Authors & Citations Graph

jungle_synthesizer/openalex-works-crawler

Scrape OpenAlex, the open scholarly graph with 250M+ works, 100M+ authors, and 120K+ institutions. Extract titles, abstracts, authors, ORCIDs, institutions, concepts, citations, open-access flags, and grants.

๐Ÿ‘ User avatar

BowTiedRaccoon

3

OpenAlex Scraper - Academic Papers & Citations

benthepythondev/openalex-scraper

OpenAlex Scraper to search 250M+ academic papers via the free OpenAlex API. Extract title, authors, institutions, year, venue, DOI, citation count, open-access status, concepts and PDF links. Filter by year and open access. For literature reviews, citation analysis and AI/RAG datasets.

OpenAlex Works Scraper

powerai/openalex-works-scraper

Collect scholarly works from OpenAlex search results by URL, with automatic pagination and structured data (title, authors, venue, citations, PDF link).

Academic Research & Papers Scraper (OpenAlex)

rupom888/academic-research-scraper

Search 200M+ academic papers, researchers, and institutions via OpenAlex API. Completely free, no API key needed. Get paper titles, abstracts, DOIs, citations, authors, open access links, and concepts. Filter by year, paper type, open access, and field of study.