Pricing
from $1.50 / 1,000 results
Google Scholar Scraper
Extract academic papers from Google Scholar: title, authors, year, journal, citation count, abstract snippet, PDF links. Search by keyword with year range filters. Stricter rate limiting for reliability. Perfect for literature review, research trend analysis, citation tracking.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
13
Total users
2
Monthly active users
2 months ago
Last modified
Categories
Share
Apify Actor to scrape Google Scholar search results with advanced filtering options.
Features
- Search by keyword: Find academic papers, articles, and books
- Author filtering: Filter results by specific authors
- Year range: Limit results to specific publication years
- Sort options: Sort by relevance or date
- Citation data: Extract citation counts and related articles
- PDF links: Automatically detect available PDF downloads
- Rate limiting: Built-in 5-10 second delays to respect Google Scholar
- Robust parsing: Handles various result formats (articles, books, citations)
Input Parameters
| Field | Type | Required | Description |
|---|---|---|---|
searchQuery | String | β | Search query (e.g., "machine learning") |
author | String | β | Filter by author name |
yearFrom | Number | β | Publication year start (1900-2100) |
yearTo | Number | β | Publication year end (1900-2100) |
sortBy | Select | β | Sort by "relevance" or "date" (default: "relevance") |
includePatents | Boolean | β | Include patents in results (default: true) |
includeCitations | Boolean | β | Include citations in results (default: true) |
maxResults | Number | β | Maximum results to scrape (default: 100, max: 1000) |
Output Format
Each result contains:
{"title":"Paper title","articleUrl":"https://example.com/paper.pdf","pdfUrl":"https://example.com/download.pdf","authors":"John Doe, Jane Smith","year":2023,"journal":"Journal of Machine Learning Research","abstract":"This paper presents...","citationCount":42,"citedByUrl":"https://scholar.google.com/scholar?cites=...","relatedArticlesUrl":"https://scholar.google.com/scholar?q=related:...","allVersionsCount":3,"isBook":false,"isCitation":false,"isPdf":true}
Usage Example
Input
{"searchQuery":"deep learning natural language processing","author":"Yoshua Bengio","yearFrom":2020,"yearTo":2024,"sortBy":"date","maxResults":50}
Run Locally
# Install dependenciesnpminstall# Build TypeScriptnpm run build# Run actor (requires input.json in root or Apify environment)npm start
Important Notes
Rate Limiting
Google Scholar is very strict about automated access:
- Actor uses 5-10 second delays between requests
- Realistic User-Agent rotation
- Proper HTTP headers to mimic browser behavior
- Automatic CAPTCHA detection and graceful shutdown
Recommendation:
- Keep
maxResultsunder 100 for reliability - Use longer delays for larger scrapes
- Consider using Google Scholar API alternatives for production use
CAPTCHA/Blocking
If Google Scholar detects automation:
- Actor logs a warning and stops gracefully
- No partial results are lost (already scraped data is saved)
- You can retry with longer delays or from a different IP
Legal Considerations
- Respect Google Scholar's Terms of Service
- Use for research/academic purposes
- Do not overload their servers
- Consider API alternatives for commercial use
Development
Build
$npm run build
Local Testing
$npm run dev
Docker Build
docker build -t google-scholar-scraper .docker run -eAPIFY_INPUT='{"searchQuery":"machine learning"}' google-scholar-scraper
Troubleshooting
No Results Found
- Check if query has typos
- Try broader search terms
- Verify year range is valid
CAPTCHA Detected
- Reduce
maxResults - Run actor less frequently
- Use different IP address
- Consider Google Scholar API
Parser Errors
- Google Scholar HTML structure may change
- Open an issue with example query
- Actor will skip unparseable results
License
Apache-2.0
Support
For issues or questions, please open a GitHub issue or contact the Apify support team.
