Pricing
from $1.00 / 1,000 results
OpenAlex Scraper
Scrape OpenAlex the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI. Pulls citations, open-access status, abstracts, authorships, journals, topics, and more.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Scrape OpenAlex โ the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI / PMID. Pulls citations, open-access status, abstracts, authorships, journals, topics. HTTP-only via the public api.openalex.org API. No auth, no proxy, no rate-limit drama (100k req/day in the polite pool).
What this actor does
- Four modes:
searchWorks,searchAuthors,byWorkIds,byAuthorIds - Universal IDs: OpenAlex (
Wโฆ,Aโฆ), DOI, PMID, PMCID, ORCID โ all auto-normalized - Reconstructs abstracts from OpenAlex's inverted index (zero extra API calls)
- Filters: publication year range, min citation count, open-access only, work type
- Sorts: relevance, most cited, newest publication date / year
- Empty fields are omitted โ no nulls reach the dataset
Output per work
openalexId,doi,pmid,pmcid,magIdโ universal IDstitle,publicationDate,publicationYear,type,languagecitedByCount,fwci(field-weighted citation impact),hasFulltextisOa,openAccessOaUrl,openAccessStatus,bestOaUrlvenueโ{name, issn_l, publisher, type, isOa, license}authorships[]โ[{authorId, name, orcid, position, institutions}, ...](whenincludeAuthorships=true)primaryAuthorโ first author display name (always present scalar)concepts[]โ top 10 OpenAlex concept tags (whenincludeConcepts=true)abstractโ reconstructed text (whenincludeAbstract=trueand OpenAlex has it)relevanceScoreโ search relevance score (search modes)openalexUrlโ canonical linkrecordType: "work",scrapedAt
Output per author
openalexId,name,orcidworksCount,citedByCountlastKnownInstitutions[]hIndex,i10IndexopenalexUrl,recordType: "author",scrapedAt
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | searchWorks | searchWorks / searchAuthors / byWorkIds / byAuthorIds |
searchQuery | string | large language models | For searchWorks / searchAuthors |
workIds | array | โ | OpenAlex IDs / DOIs / PMIDs / PMCIDs (for byWorkIds) |
authorIds | array | โ | OpenAlex author IDs / ORCIDs (for byAuthorIds) |
publicationYearMin | int | โ | Drop works before this year |
publicationYearMax | int | โ | Drop works after this year |
minCitedBy | int | โ | Drop works with fewer citations |
openAccessOnly | bool | false | Only emit OA works |
workType | string | any | article/book/preprint/review/dataset/etc. |
sortBy | string | relevance_score:desc | Search ordering |
includeAbstract | bool | true | Reconstruct abstract from inverted index |
includeAuthorships | bool | true | Full authorship array |
includeConcepts | bool | true | Top concept tags |
userAgentEmail | string | apify-actor@noreply.apify.com | OpenAlex polite-pool email |
maxItems | int | 50 | Hard cap (1โ10000) |
Example: top-cited LLM papers from 2024
{"mode":"searchWorks","searchQuery":"large language models","publicationYearMin":2024,"minCitedBy":50,"sortBy":"cited_by_count:desc","maxItems":100}
Example: lookup specific papers by DOI
{"mode":"byWorkIds","workIds":["10.1145/3442188.3445922","https://doi.org/10.48550/arXiv.2310.06825","pmid:25524000"]}
Example: all works by an author (Geoffrey Hinton)
{"mode":"byAuthorIds","authorIds":["A1969205038"],"minCitedBy":100,"maxItems":200}
Example: open-access ML papers only
{"mode":"searchWorks","searchQuery":"machine learning fairness","openAccessOnly":true,"workType":"article","publicationYearMin":2020}
Use cases
- Literature reviews โ bulk-export every paper matching a topic across all disciplines
- Citation tracking โ find the most-cited works on a topic, or all works citing a specific paper
- Author intelligence โ track an author's publication record, h-index, institutional affiliations
- Open-access auditing โ find OA copies of every paper in a reading list
- Topic monitoring โ schedule recurring runs to catch new papers in your area
- Cross-database enrichment โ feed DOIs from arXiv / PubMed / Crossref โ enrich with OpenAlex citations
FAQ
What's OpenAlex? An open replacement for Microsoft Academic Graph: 250M+ scholarly works, 80M+ authors, free for any use, fully indexed by content+citations. See openalex.org.
Is there a rate limit? Yes โ 100k requests/day in the polite pool (anyone with an email in their User-Agent). The actor sets this header automatically.
Why are abstracts sometimes missing? OpenAlex omits abstracts when their license doesn't permit redistribution. The actor returns whatever's available; missing abstracts mean the source publisher doesn't allow it.
How does it differ from arXiv / PubMed? OpenAlex is broader โ covers all disciplines, all sources (preprint servers, journals, books, datasets). arXiv only covers preprints in physics/math/CS. PubMed only covers biomedical literature.
What ID formats are accepted? OpenAlex IDs (W123โฆ, A123โฆ), full DOI URLs (https://doi.org/10.1145/...), bare DOIs (10.1145/...), pmid:N, pmcid:N, and ORCIDs (0000-0001-โฆ).
What's fwci? Field-weighted citation impact โ a paper's citation count normalized to its field's average. 1.0 = field average, 2.0 = twice field average, etc. Useful for cross-discipline comparison.
Why is concepts capped at 10? OpenAlex assigns dozens of low-confidence concepts per work. We keep the top 10 (already sorted by score) for table display compactness; the full list is in OpenAlex's web UI.
How fresh is the data? Daily โ OpenAlex re-indexes nightly from Crossref, PubMed, ORCID, ROR, etc.
