Pricing
from $4.99 / 1,000 results
Go to Apify Store
Google Scholar Scraper
A robust, high-performance utility designed for developer automation, data integration, and AI training. Features built-in captcha bypass, headful/headless browser execution, and proxy support to scrape Google data seamlessly, reliably, and at scale.
Pricing
from $4.99 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
Features
- Academic Papers: Extracts research papers and academic articles from Google Scholar search results
- Citations Count: Captures how many times each paper has been cited by other works
- Author Information: Records the names of all authors for each paper
- Publication Venue: Extracts the journal or conference where the paper was published
- Publication Year: Captures the year each paper was published
- PDF Links: Collects direct PDF links when available for open-access papers
- Text Snippets: Retrieves descriptive text snippets shown in Google Scholar results
- Date Range Filtering: Filter papers by publication year range (yearFrom and yearTo)
- Sort Options: Sort results by relevance or publication date
- Proxy Support: Built-in Apify Proxy with residential proxies to avoid Scholar rate limiting
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | String | Yes | "machine learning" | Academic search query to look up on Google Scholar |
maxItems | Integer | No | 50 | Maximum number of results to retrieve (1โ10000) |
yearFrom | Integer | No | โ | Filter results published from this year onwards (e.g., 2020) |
yearTo | Integer | No | โ | Filter results published up to this year (e.g., 2024) |
sortBy | String | No | "relevance" | Sort results by: relevance or date |
proxyConfiguration | Object | No | Apify Residential | Proxy settings for the scraper |
Input Schema Example
{"query":"deep learning natural language processing","maxItems":100,"yearFrom":2020,"yearTo":2025,"sortBy":"date","proxyConfiguration":{"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]}}
Output Schema
The scraper outputs structured JSON data for each academic paper found on Google Scholar.
Main Fields
| Field | Type | Description |
|---|---|---|
position | Integer | Result position in search results |
title | String | Paper title |
link | String | URL to the paper |
authors | String | Paper authors |
publication | String | Publication venue (journal, conference, etc.) |
year | Integer | Publication year |
citedBy | Integer | Number of citations |
snippet | String | Text snippet from the paper |
pdfLink | String | Direct link to PDF if available |
searchQuery | String | The search query used |
searchUrl | String | Google Scholar search URL |
scrapedAt | String | ISO timestamp of when the data was scraped |
Academic Paper Example
{"position":1,"title":"Attention Is All You Need","link":"https://arxiv.org/abs/1706.03762","authors":"A Vaswani, N Shazeer, N Parmar, J Uszkoreit","publication":"Advances in neural information processing systems, 2017","year":2017,"citedBy":98450,"snippet":"The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...","pdfLink":"https://arxiv.org/pdf/1706.03762","searchQuery":"deep learning natural language processing","searchUrl":"https://scholar.google.com/scholar?q=deep+learning+natural+language+processing","scrapedAt":"2025-01-15T10:30:00.000Z"}
