VOOZH about

URL: https://apify.com/parseforge/openalex-scraper

โ‡ฑ OpenAlex Scholarly Works Scraper ยท Apify


Pricing

Pay per event

Go to Apify Store

OpenAlex Scholarly Works Scraper

Export academic works, authors, institutions, sources, and concepts from OpenAlexs open catalog of 250M+ scholarly records. Successor to Microsoft Academic Graph. Filter by author, concept, year, open access status, or affiliation.

Pricing

Pay per event

Rating

5.0

(1)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

13

Total users

2

Monthly active users

25 days ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐ŸŽ“ OpenAlex Scholarly Works Scraper

๐Ÿš€ Export academic works, authors, institutions, and more from OpenAlex in seconds. Filter by search query, entity type, or custom filters. No coding, no API keys required.

๐Ÿ•’ Last updated: 2026-04-16 ยท ๐Ÿ“Š 30+ fields ยท ๐Ÿ”„ Runs on Apify cloud or locally ยท ๐Ÿ“ Export: JSON, CSV, Excel

The OpenAlex Scholarly Works Scraper connects to OpenAlex, the free and open catalog of 250M+ scholarly records that succeeded Microsoft Academic Graph. It supports 7 entity types: works, authors, institutions, sources, concepts, publishers, and funders. Each record includes 30+ structured fields with titles, DOIs, citation counts, open access status, author details, institutional affiliations, and more. Whether you need 10 papers for a quick lookup or millions of records for a large-scale bibliometric study, this tool handles it efficiently.

Built for researchers conducting literature reviews, bibliometricians analyzing citation networks, university administrators tracking institutional output, and data teams building scholarly knowledge graphs. The scraper uses the OpenAlex API with support for free-text search and the full OpenAlex filter syntax. Providing a contact email puts your requests in the "polite pool" for faster processing.

Target AudienceUse Cases
Academic ResearchersLiterature reviews, citation analysis
BibliometriciansCitation network mapping, impact studies
University AdministratorsInstitutional output tracking
Data ScientistsKnowledge graph construction, NLP corpus building
Funding AgenciesResearch output assessment, grant evaluation
Library ScientistsCollection development, trend analysis

๐Ÿ“‹ What the OpenAlex Scholarly Works Scraper does

  • ๐Ÿ“ Extracts scholarly work metadata including titles, abstracts, DOIs, publication dates, and citation counts for bibliometric analysis
  • ๐Ÿ‘ฅ Collects author profiles with names, ORCID IDs, institutional affiliations, and publication histories
  • ๐Ÿซ Gathers institution data including names, types, locations, and research output statistics
  • ๐Ÿ“ฐ Pulls source information for journals, conferences, and repositories with ISSN, publisher, and open access details
  • ๐Ÿ”— Captures concept and topic data for subject classification and research trend analysis
  • ๐Ÿ“Š Tracks open access status with OA type, OA URL, and license information for each work

The scraper queries the OpenAlex API with your search terms and optional filters, handles cursor-based pagination, and processes results efficiently. The OpenAlex filter syntax supports field-level filtering like publication_year:2024,is_oa:true,authorships.institutions.country_code:US for precise targeting.

๐Ÿ’ก Why it matters: OpenAlex is the largest free scholarly database, covering 250M+ works, 90M+ authors, and 100K+ institutions. This scraper gives you structured access to this data without writing API integration code.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon...


โš™๏ธ Input

FieldTypeRequiredDescription
maxItemsintegerNoMaximum records to collect. Free users: limited to 10. Paid users: up to 1,000,000.
entitystringNoEntity type: works, authors, institutions, sources, concepts, publishers, or funders.
searchstringNoFree text search across titles, abstracts, and display names.
filterstringNoOpenAlex filter string (e.g., "publication_year:2024,is_oa:true").
emailstringNoContact email for OpenAlex "polite pool" (faster processing). Optional.

Example 1: Search for machine learning papers

{
"entity":"works",
"search":"machine learning",
"maxItems":100
}

Example 2: Open access papers from US institutions in 2024

{
"entity":"works",
"search":"climate change",
"filter":"publication_year:2024,is_oa:true,authorships.institutions.country_code:US",
"maxItems":500,
"email":"researcher@university.edu"
}

โš ๏ธ Good to Know: Providing your email address puts your requests in OpenAlex's "polite pool" for faster rate limits. The filter syntax supports dozens of fields. Free users are automatically limited to 10 items per run.


๐Ÿ“Š Output

๐Ÿงพ Schema

EmojiFieldTypeDescription
๐Ÿ“titlestringWork title or entity display name
๐Ÿ†”idstringOpenAlex ID
๐Ÿ”—doistringDigital Object Identifier (works)
๐ŸŒurlstringOpenAlex URL
๐Ÿ“…publicationDatestringPublication date (works)
๐Ÿ“…publicationYearnumberPublication year
๐Ÿ‘ฅauthorsarrayAuthor names and affiliations
๐Ÿ“ŠcitationCountnumberTotal citations received
๐Ÿ“ŠcitedByCountnumberNumber of citing works
๐Ÿ“–abstractstringArticle abstract (when available)
๐Ÿ“ฐsourcestringJournal or venue name
๐Ÿ”“isOpenAccessbooleanWhether the work is open access
๐Ÿ”“oaTypestringOA type (gold, green, bronze, hybrid)
๐Ÿ”—oaUrlstringURL to free version
โš–๏ธlicensestringLicense type
๐Ÿท๏ธconceptsarrayAssociated concepts/topics
๐ŸซinstitutionsarrayAuthor institutions
๐ŸŒcountriesarrayAuthor country codes
๐Ÿ“ŠreferencedWorksCountnumberNumber of references
๐Ÿ“ŠrelatedWorksCountnumberNumber of related works
๐Ÿ”ขvolumestringJournal volume
๐Ÿ”ขissuestringJournal issue
๐Ÿ“„pagesstringPage range
๐Ÿท๏ธtypestringWork type (article, book, etc.)
๐Ÿ”ขorcidstringAuthor ORCID ID (authors entity)
๐ŸซaffiliationstringCurrent affiliation (authors)
๐Ÿ“ŠworksCountnumberTotal works (authors/institutions)
๐Ÿ“ŠhIndexnumberH-index (authors)
๐Ÿ“…scrapedAtstringData collection timestamp
โŒerrorstringError message if extraction failed

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

FeatureDetails
๐Ÿ“Š 250M+ recordsAccess the largest free scholarly database
๐Ÿ” 7 entity typesWorks, authors, institutions, sources, concepts, publishers, funders
๐Ÿ”“ Open access trackingOA status, type, URL, and license for every work
๐Ÿ“Š Citation metricsCitation counts, h-index, and referenced works
๐Ÿ”ง Advanced filtersFull OpenAlex filter syntax for precise queries
๐Ÿ“ Multiple export formatsJSON, CSV, Excel for any workflow
โšก Polite pool supportProvide email for faster processing

๐Ÿ“ˆ Typical performance: Collects 500+ records per minute in polite pool mode. A dataset of 10,000 works takes roughly 20 minutes.


๐Ÿ“ˆ How it compares to alternatives

FeatureThis ActorDirect API IntegrationGeneric Scrapers
30+ structured fields per recordโœ…โœ… (requires coding)Partial
7 entity types in one toolโœ…โœ… (requires coding)โŒ
No coding requiredโœ…โŒโŒ
Export to CSV/JSON/Excelโœ…โŒ (raw JSON)Partial
Automatic paginationโœ…ManualPartial
Scheduled runsโœ…Custom setupPartial
Filter syntax supportโœ…โœ…โŒ

All the features of the OpenAlex API, without writing a single line of code.


๐Ÿš€ How to use

  1. Create a free Apify account - Sign up here (includes free credits)
  2. Open the OpenAlex Scholarly Works Scraper - Navigate to the Actor page and click "Start"
  3. Choose your entity type - Select works, authors, institutions, or another entity type
  4. Set your search and filters - Enter a search query and optional OpenAlex filters
  5. Run and download - Click "Start", wait for completion, then export as JSON, CSV, or Excel

โฑ๏ธ First results appear in under 10 seconds. A typical run of 100 records completes in about 30 seconds.


๐Ÿ’ผ Business use cases

Academic Research

  • Build citation network datasets
  • Track research trends by topic over time
  • Find collaborators at specific institutions
  • Monitor open access adoption in your field

University Administration

  • Track institutional research output
  • Benchmark against peer institutions
  • Generate faculty publication reports
  • Monitor author h-indexes and citation impact

Data Science & AI

  • Build scholarly knowledge graphs
  • Create NLP training corpora from abstracts
  • Analyze collaboration patterns
  • Train topic classification models

Funding & Policy

  • Assess research output for grant evaluation
  • Track funded research productivity
  • Analyze open access compliance rates
  • Map research activity by country and institution


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

๐ŸŽจ Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

๐Ÿค Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

๐Ÿงช Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

โ“ Frequently Asked Questions

๐Ÿ”Œ Automating OpenAlex Scholarly Works Scraper

Node.js

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor("parseforge/openalex-scraper").call({
entity:"works",
search:"machine learning",
filter:"publication_year:2024,is_oa:true",
maxItems:200
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parseforge/openalex-scraper").call(run_input={
"entity":"works",
"search":"machine learning",
"filter":"publication_year:2024,is_oa:true",
"maxItems":200
})
items =list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)

Schedules: Set up weekly or monthly runs with Apify Schedules to track new publications, monitor citation growth, or maintain up-to-date researcher profiles.

๐Ÿ”Œ Integrate with any app

  • ๐Ÿ”— Make (Integromat) - Connect OpenAlex data to 1,000+ apps with visual workflows
  • ๐Ÿ”— Zapier - Trigger actions when new scholarly records match your criteria
  • ๐Ÿ”— Slack - Get notifications when new papers are published in your field
  • ๐Ÿ”— Airbyte - Sync scholarly data to your data warehouse
  • ๐Ÿ”— GitHub - Automate research data pipelines with GitHub Actions
  • ๐Ÿ”— Google Drive - Export scholarly data directly to Google Sheets

๐Ÿ”— Recommended Actors

ActorDescription
๐Ÿ“š PubMed Citation ScraperExtract citation data and metadata from PubMed biomedical literature
๐Ÿ“– PLOS Journals ScraperCollect article data from PLOS ONE and other PLOS journals
๐Ÿงฌ Crossref ScraperCollect DOI metadata and citation information from Crossref
๐Ÿ“ฐ medRxiv ScraperExtract health sciences preprint data from medRxiv
๐Ÿ“„ Semantic Scholar ScraperQuery the Semantic Scholar API for academic paper data

๐Ÿ’ก Pro Tip: Use OpenAlex to find papers by topic, then cross-reference with the Crossref Scraper for detailed citation metadata and reference lists.


๐Ÿ†˜ Need Help? Open our contact form and we will get back to you within 24 hours. For bug reports, feature requests, or integration help, we are here to assist.


Disclaimer: This Actor is provided as-is, without warranty. It is not affiliated with or endorsed by OpenAlex or OurResearch. Use it responsibly and in compliance with applicable terms of service. The authors are not responsible for how the collected data is used. Always verify data accuracy for critical applications.

You might also like

OpenAlex Scraper - Scholarly Works, Authors & Citations Graph

jungle_synthesizer/openalex-works-crawler

Scrape OpenAlex, the open scholarly graph with 250M+ works, 100M+ authors, and 120K+ institutions. Extract titles, abstracts, authors, ORCIDs, institutions, concepts, citations, open-access flags, and grants.

๐Ÿ‘ User avatar

BowTiedRaccoon

3

OpenAlex Academic Works Scraper

crawlerbros/philpapers-scraper

Search and scrape academic papers from OpenAlex - the free, open academic database with 200M+ works. Filter by keyword, author, year, open access status, and type. No API key required.

OpenAlex Scraper

crawlerbros/openalex-scraper

Scrape OpenAlex the free, open catalog of 250M+ scholarly works, authors, institutions, and concepts. Search papers, authors, or fetch by OpenAlex ID / DOI. Pulls citations, open-access status, abstracts, authorships, journals, topics, and more.

OpenAlex Scraper

gio21/openalex-scraper

Scrape OpenAlex - the free open catalog of scholarly works (250M+ papers, 100M+ authors, 100K institutions). Search across works, authors, institutions, concepts, journals. Returns title, abstract, authors, citations, DOI, OA status, and more.

OpenAlex Academic Research Scraper

gentle_cloud/openalex-research-scraper

Search and extract academic paper metadata from the OpenAlex API. Supports keyword search, author search, institution filter, and citation analysis. Free, no API key required. 250M+ scholarly works.

OpenAlex Scraper - Academic Papers & Citations

benthepythondev/openalex-scraper

OpenAlex Scraper to search 250M+ academic papers via the free OpenAlex API. Extract title, authors, institutions, year, venue, DOI, citation count, open-access status, concepts and PDF links. Filter by year and open access. For literature reviews, citation analysis and AI/RAG datasets.