VOOZH about

URL: https://apify.com/cloud9_ai/google-scholar-scraper

⇱ Google Scholar Scraper - Extract Academic Papers & Citations Β· Apify


Pricing

from $1.50 / 1,000 results

Go to Apify Store

Google Scholar Scraper

Extract academic papers from Google Scholar: title, authors, year, journal, citation count, abstract snippet, PDF links. Search by keyword with year range filters. Stricter rate limiting for reliability. Perfect for literature review, research trend analysis, citation tracking.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ cloud9

cloud9

Maintained by Community

Actor stats

0

Bookmarked

13

Total users

2

Monthly active users

2 months ago

Last modified

Categories

Share

Apify Actor to scrape Google Scholar search results with advanced filtering options.

Features

  • Search by keyword: Find academic papers, articles, and books
  • Author filtering: Filter results by specific authors
  • Year range: Limit results to specific publication years
  • Sort options: Sort by relevance or date
  • Citation data: Extract citation counts and related articles
  • PDF links: Automatically detect available PDF downloads
  • Rate limiting: Built-in 5-10 second delays to respect Google Scholar
  • Robust parsing: Handles various result formats (articles, books, citations)

Input Parameters

FieldTypeRequiredDescription
searchQueryStringβœ…Search query (e.g., "machine learning")
authorString❌Filter by author name
yearFromNumber❌Publication year start (1900-2100)
yearToNumber❌Publication year end (1900-2100)
sortBySelect❌Sort by "relevance" or "date" (default: "relevance")
includePatentsBoolean❌Include patents in results (default: true)
includeCitationsBoolean❌Include citations in results (default: true)
maxResultsNumber❌Maximum results to scrape (default: 100, max: 1000)

Output Format

Each result contains:

{
"title":"Paper title",
"articleUrl":"https://example.com/paper.pdf",
"pdfUrl":"https://example.com/download.pdf",
"authors":"John Doe, Jane Smith",
"year":2023,
"journal":"Journal of Machine Learning Research",
"abstract":"This paper presents...",
"citationCount":42,
"citedByUrl":"https://scholar.google.com/scholar?cites=...",
"relatedArticlesUrl":"https://scholar.google.com/scholar?q=related:...",
"allVersionsCount":3,
"isBook":false,
"isCitation":false,
"isPdf":true
}

Usage Example

Input

{
"searchQuery":"deep learning natural language processing",
"author":"Yoshua Bengio",
"yearFrom":2020,
"yearTo":2024,
"sortBy":"date",
"maxResults":50
}

Run Locally

# Install dependencies
npminstall
# Build TypeScript
npm run build
# Run actor (requires input.json in root or Apify environment)
npm start

Important Notes

Rate Limiting

Google Scholar is very strict about automated access:

  • Actor uses 5-10 second delays between requests
  • Realistic User-Agent rotation
  • Proper HTTP headers to mimic browser behavior
  • Automatic CAPTCHA detection and graceful shutdown

Recommendation:

  • Keep maxResults under 100 for reliability
  • Use longer delays for larger scrapes
  • Consider using Google Scholar API alternatives for production use

CAPTCHA/Blocking

If Google Scholar detects automation:

  • Actor logs a warning and stops gracefully
  • No partial results are lost (already scraped data is saved)
  • You can retry with longer delays or from a different IP

Legal Considerations

  • Respect Google Scholar's Terms of Service
  • Use for research/academic purposes
  • Do not overload their servers
  • Consider API alternatives for commercial use

Development

Build

$npm run build

Local Testing

$npm run dev

Docker Build

docker build -t google-scholar-scraper .
docker run -eAPIFY_INPUT='{"searchQuery":"machine learning"}' google-scholar-scraper

Troubleshooting

No Results Found

  • Check if query has typos
  • Try broader search terms
  • Verify year range is valid

CAPTCHA Detected

  • Reduce maxResults
  • Run actor less frequently
  • Use different IP address
  • Consider Google Scholar API

Parser Errors

  • Google Scholar HTML structure may change
  • Open an issue with example query
  • Actor will skip unparseable results

License

Apache-2.0

Support

For issues or questions, please open a GitHub issue or contact the Apify support team.

You might also like

Google Scholar Scraper - Academic Papers Search

gio21/google-scholar-scraper

Search Google Scholar for academic papers. Get title, authors, year, publication, snippet, cited-by count, PDF links. Filter by year range, language.

Google Scholar Scraper

automation-lab/google-scholar-scraper

Search Google Scholar and extract academic papers. Get titles, authors, citation counts, abstracts, PDF links, and publication details. Supports year filtering.

πŸ‘ User avatar

Stas Persiianenko

12

Google Scholar Lite - Cheap Bulk Academic Papers API

johnvc/google-scholar-lite-api

Search Google Scholar for academic papers in bulk and export clean JSON: title, authors, journal, year, citation count, and PDF links. Fast bibliometric search for literature reviews, citation discovery, and research datasets. Pay per paper from $1.50 per 1,000, with no setup or per-run fee.

Google Scholar Scraper - Low-costπŸ’²πŸ”₯πŸ“šπŸŽ“

delectable_incubator/google-scholar-scraper-low-cost

Scrape Google Scholar academic papers πŸ“šπŸ” with a powerful research scraper. Extract paper titles, authors, publication dates, journals/sources, citations, and direct links to full texts. Ideal for academic research, literature reviews, citation analysis, AI/NLP training, and knowledge discovery πŸš€

Google Scholar Search Scraper

ecomscrape/google-scholar-search-scraper

Extract comprehensive academic data from Google Scholar including research papers, citations, author information, and PDF links. Automate your literature review process with advanced scraping capabilities for researchers and academics.

ecomscrape

23

Google Scholar Scraper

crawlerbros/google-scholar-scraper

Scrape academic papers, articles, and citations from Google Scholar. Search by keywords with filters for year range, document type, sort order, and article type. Extract titles, authors, citations, links, and more.

54

5.0

Google Scholar Scraper

solidcode/google-scholar-scraper

[πŸ’° $2.0 / 1K] Extract academic papers, author profiles, h-index, i10-index, citation counts, abstracts, and PDF links from Google Scholar. Batch search queries and author IDs, filter by year range, sort by relevance or date.

Related articles

Top 5 Google Scholar APIs to extract article data
Read more