GitHub Repository Scraper

Pricing

$10.00/month + usage

GitHub Repository Scraper

Scrape and extract GitHub repository data, metadata, statistics, stars, forks, issues, and project information from multiple repositories at once.

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

👁 VulnV

VulnV

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 months ago

Last modified

GitHub Repository Scraper - Extract Repository Data at Scale

Overview

The GitHub Repository Scraper is a powerful Apify Actor designed to extract comprehensive data from GitHub repositories efficiently. Perfect for competitive analysis, market research, developer insights, or building repository databases — this scraper provides detailed information about repositories, statistics, and project metadata.

✅ Bulk URL processing | ✅ Comprehensive repository data | ✅ Statistics extraction | ✅ Metadata analysis | ✅ Concurrent processing

Complete Repository Data Extraction

Basic Information — Repository name, description, owner, creation date
Statistics — Stars, forks, watchers, usage metrics
Technical Details — Programming languages, file counts, commit information
Project Metadata — Topics, license information, default branch
Enhanced Repository Data — GitHub IDs, clone URLs, file listings, branch info
Owner Information — Detailed owner profiles with avatars and organization status
Repository Structure — File counts, directory listings, README information
Access URLs — Multiple clone formats (HTTPS, SSH, GitHub CLI), download links

Key Features

Bulk Processing — Process multiple GitHub repository URLs in one run
Smart URL Parsing — Automatically extracts repository paths from full GitHub URLs
Proxy Support — Built-in Apify proxy integration for reliable scraping
Error Handling — Robust error handling with detailed status reporting
Clean JSON Output — Structured, ready-to-use data format
Concurrent Processing — Configurable concurrency for optimal performance
Format Flexibility — Accepts various URL formats and automatically normalizes them

🧾 Input Configuration

Submit an array of GitHub repository URLs via the input schema:

{
"urls":[
"https://github.com/microsoft/vscode",
"https://github.com/facebook/react",
"https://github.com/nodejs/node",
"https://github.com/torvalds/linux"
],
"maxConcurrency":5,
"includeNotFound":false,
"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["RESIDENTIAL"]
}
}

Input Parameters

URLs (required):
- Array of GitHub repository URLs to scrape
- Supported formats: https://github.com/owner/repo, github.com/owner/repo
- Invalid URLs will be automatically filtered out with warnings
Max Concurrency (optional):
- Number of concurrent requests for scraping (1-20)
- Default: 5
- Higher values = faster processing but may increase chance of rate limiting
Include Not Found (optional):
- Whether to include repositories that return 404 (not found) in the results
- Default: false
- When enabled, includes error information for non-existent repositories
Proxy Configuration (recommended):
- Configure Apify proxy settings to avoid rate limiting
- Recommended for bulk scraping operations
- Format:
```
"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["RESIDENTIAL"]
}
```
- Available proxy groups: RESIDENTIAL, DATACENTER, GOOGLE_SERP
- Use RESIDENTIAL for best reliability when scraping GitHub

Proxy Configuration Examples

For small-scale scraping (< 100 repositories):

"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["DATACENTER"]
}

For large-scale or production scraping (recommended):

"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["RESIDENTIAL"]
}

No proxy (not recommended for bulk operations):

// Omit proxyConfiguration entirely - may result in rate limiting

📤 Output Format

Each GitHub repository returns comprehensive structured data including enhanced metadata extracted from GitHub's embedded data:

{
"url":"https://github.com/microsoft/vscode",
"repoPath":"microsoft/vscode",
"success":true,
"data":{
"url":"https://github.com/microsoft/vscode",
"type":"repo",
"description":"Visual Studio Code",
"website":"https://code.visualstudio.com",
"forkedfrom":null,
"tags":["editor","typescript","electron","ide"],
"usedby":250000,
"watchers":3200,
"stars":162000,
"forks":28500,
"langs":[
{"name":"TypeScript","perc":"93.2%"},
{"name":"JavaScript","perc":"4.1%"},
{"name":"CSS","perc":"1.5%"}
],
// Enhanced data from GitHub's embedded JSON
"id":41881900,
"name":"vscode",
"full_name":"microsoft/vscode",
"owner":"microsoft",
"default_branch":"main",
"is_fork":false,
"is_empty":false,
"is_private":false,
"is_org_owned":true,
"created_at":"2015-09-03T20:23:30.000Z",
"clone_url":"https://github.com/microsoft/vscode.git",
"ssh_url":"git@github.com:microsoft/vscode.git",
"api_url":"https://api.github.com/repos/microsoft/vscode",
// Owner information
"owner_info":{
"login":"microsoft",
"type":"Organization",
"url":"https://github.com/microsoft",
"avatar_url":"https://avatars.githubusercontent.com/u/6154722?v=4"
},
// File and repository structure
"file_count":15420,
"files":[
{"name":"README.md","path":"README.md","type":"file"},
{"name":"package.json","path":"package.json","type":"file"},
{"name":"src","path":"src","type":"directory"}
],
// Clone and download URLs
"clone_urls":{
"https":"https://github.com/microsoft/vscode.git",
"ssh":"git@github.com:microsoft/vscode.git",
"github_cli":"gh repo clone microsoft/vscode"
},
"download_url":"/microsoft/vscode/archive/refs/heads/main.zip",
// Branch and commit information
"ref_info":{
"name":"main",
"type":"branch",
"current_oid":"585acf48f88e399989d54f001029424b2b7c358a",
"can_edit":false
},
"commit_count":"185,234",
// README information
"readme_info":{
"displayName":"README.md",
"repoName":"vscode",
"refName":"main",
"path":"README.md",
"loaded":true
},
// Metadata
"enriched_at":"2024-12-29T15:30:45.123Z",
"data_source":"github_scraper_enhanced"
}
}

Error Handling

Failed repositories return structured error information:

{
"url":"https://github.com/invalid/repo",
"repoPath":"invalid/repo",
"success":false,
"error":"Repository not found or private"
}

When includeNotFound is enabled, 404 repositories return structured data:

{
"url":"https://github.com/nonexistent/repo",
"repoPath":"nonexistent/repo",
"success":true,
"data":{
"exists":false,
"error":"Repository not found",
"statusCode":404
}
}

Common Error Cases:

Repository not found or private — Repository doesn't exist or is private
Network error — Connection issues or scraping errors
Invalid URLs are filtered out before processing with warning logs

💼 Common Use Cases

Competitive Analysis & Market Research

Analyze competitor repositories and project activity
Track technology trends through repository statistics
Research popular libraries and frameworks in specific domains
Monitor open source project adoption rates

Developer & Technology Research

Study programming language usage patterns
Analyze repository structures and best practices
Research active open source projects in specific technologies
Track development activity and contribution patterns

Portfolio & Investment Analysis

Research technology companies and their open source contributions
Analyze developer productivity and project health metrics
Track repository growth and community engagement
Identify trending projects and technologies

Academic & Educational Research

Study software development patterns and practices
Analyze open source community dynamics
Research programming language evolution
Track educational resource repositories

📊 Output & Export Options

Dataset Storage

All extracted data stored in Apify dataset
Each repository becomes one dataset item
Status tracking for successful and failed extractions

Export Formats

JSON — Raw structured data for API integration
CSV — Spreadsheet-compatible format for analysis
Excel — Formatted spreadsheet with repository data

Data Processing

Clean, validated URLs
Structured error reporting
Comprehensive logging for troubleshooting

⚡ Quick Start Guide

Configure Input:
- Add GitHub repository URLs to the urls array
- Set desired maxConcurrency (recommended: 5-10)
- Configure proxyConfiguration with useApifyProxy: true and appropriate proxy groups for reliable scraping
Run the Actor:
- Execute through Apify Console or API
- Monitor progress through real-time logs
- Review extracted data in the dataset
Export Results:
- Download data in your preferred format
- Integrate with your existing tools and workflows

🆘 Support & Feedback

For questions, feature requests, or technical support:

Visit the Apify Community Forum
Contact us through the Apify platform
Submit issues for improvements and bug reports

🌟 Explore More Actors

✨ Need more scraping solutions? Discover additional actors on Apify for comprehensive web automation and data extraction. Explore our full range of tools at 🌐 Explore More Actors on Apify.

📧 For inquiries or custom development, reach out at apify@vulnv.com.

GitHub Repository Scraper

koreyoshi/github-repo-scraper

👁 User avatar

Mr-chen

GitHub Repository Scraper

skystone_labs/github-repo-scraper

Extract GitHub repository metadata using GitHub API and scraping. Get repo info, stars, forks, language, topics, and README content. Perfect for research, analysis, and building datasets.

👁 User avatar

Skystone

👁 GitHub Scraper avatar

GitHub Scraper

pear_fight/github-scraper

Scrape repositories, stars, issues and more from GitHub

👁 User avatar

Harald

👁 GitHub Repositories Scraper - Cheap📦🐙🔍 avatar

GitHub Repositories Scraper - Cheap📦🐙🔍

scrapestorm/github-repositories-scraper-cheap

🔍 Easily collect repositories from GitHub Provide a GitHub profile URL or username and extract detailed repository information such as repository name, description, language, stars, topics & repository link 📦🐙 Perfect for open-source analysis, developer scouting & market intelligence 📊🔥

👁 User avatar

Storm_Scraper

👁 Github Repository Analyzer avatar

Github Repository Analyzer

actually_good_at_this/apify-github-repository-analyzer

GitHub Repository Analyzer extracts comprehensive repository metrics using the official GitHub API: stars, forks, watchers, contributors, commit activity, and issues/PRs.

👁 User avatar

john Y

GitHub Stars Tracker

glassventures/github-stars-tracker

Track GitHub repository stars, forks, and metadata. Extract repo stats, stargazer data, and search repositories by keywords.

👁 User avatar

Glass Ventures

👁 GitHub repositories Scraper - Low-cost💲🔥📦🐙 avatar

GitHub repositories Scraper - Low-cost💲🔥📦🐙

delectable_incubator/github-repositories-scraper-low-cost

Scrape GitHub repositories 📦🐙 with a powerful developer data scraper. Extract repository names, descriptions, programming languages, stars, topics, forks, and repository URLs from any GitHub profile. Ideal for open-source analysis, developer scouting, technology research and market insights 📊🚀

👁 User avatar

Prime Scrape

Github Repo Scraper

oneary/github-repo-scraper

Scrape GitHub repository metadata including stars, forks, issues, pull requests, and contributor data. Ideal for open-source market research.

👁 User avatar

Luan M.

GitHub Repository Analyzer

optimus-fulcria/github-repo-analyzer

Analyze GitHub repositories: stars, forks, issues, contributors, languages, commit activity. Competitive intelligence for open source.

👁 User avatar

Fulcria Labs

GitHub Repository Scraper

cloud9_ai/github-scraper

Scrape GitHub repositories, users, and trending projects via REST API. Extract repo names, stars, forks, languages, descriptions, and contributor data.

👁 User avatar

cloud9

URL: https://apify.com/vulnv/github-repository-scraper

⇱ GitHub Repository Scraper · Apify

GitHub Repository Scraper

GitHub Repository Scraper - Extract Repository Data at Scale

Overview

Complete Repository Data Extraction

Key Features

🧾 Input Configuration

Input Parameters

Proxy Configuration Examples

📤 Output Format

Error Handling

💼 Common Use Cases

Competitive Analysis & Market Research

Developer & Technology Research

Portfolio & Investment Analysis

Academic & Educational Research

📊 Output & Export Options

Dataset Storage

Export Formats

Data Processing

⚡ Quick Start Guide

🆘 Support & Feedback

🌟 Explore More Actors

You might also like

GitHub Repository Scraper

GitHub Repository Scraper

GitHub Scraper

GitHub Repositories Scraper - Cheap📦🐙🔍

Github Repository Analyzer

GitHub Stars Tracker

GitHub repositories Scraper - Low-cost💲🔥📦🐙

Github Repo Scraper

GitHub Repository Analyzer

GitHub Repository Scraper