Pricing
$20.00/month + usage
Github Profile Scraper
Scrapes GitHub user profiles including bio, repositories, followers, contributions, and more. Accepts a list of usernames and extracts comprehensive profile data.
Pricing
$20.00/month + usage
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
34
Total users
1
Monthly active users
7 months ago
Last modified
Categories
Share
π GitHub Profile Scraper β‘ Extract Developer Profiles at Scale
Overview
The GitHub Profile Scraper is a powerful Apify Actor designed to extract comprehensive data from GitHub user profiles efficiently. Perfect for recruitment, developer research, competitive analysis, or building developer databases β this scraper provides detailed insights into GitHub users' professional profiles, repositories, and contributions.
β Bulk username processing | β Comprehensive profile data | β Email extraction (when public) | β Repository analysis | β Contribution tracking
Complete Profile Data Extraction
- Basic Information β Name, username, bio, location, website
- Contact Details β Email addresses (when publicly visible)
- Professional Details β Company, Twitter/X handle
- Network Statistics β Followers, following counts
- Repository Data β Public repositories count, pinned repositories with details
- Activity Metrics β Contribution counts and contribution graph data
- Social Links β Website, social media profiles
- Starred Repositories β List of starred projects (when accessible)
Key Features
- Bulk Processing β Process multiple GitHub usernames in one run
- Smart Email Detection β Extracts emails using multiple methods including
itemprop="email"elements (only for publicly visible emails) - Proxy Support β Built-in Apify proxy integration for reliable scraping
- Error Handling β Robust error handling with detailed status reporting
- Clean JSON Output β Structured, ready-to-use data format
- Username Validation β Automatic username cleaning and validation with GitHub format requirements
- Format Flexibility β Accepts various username formats and automatically normalizes them
π§Ύ Input Configuration
Submit an array of GitHub usernames via the input schema:
{"usernames":["johndeveloper","jane-coder","techexpert123","@another-user","https://github.com/some-developer"],"max_threads":5,"proxy_configuration":{"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]}}
Note: The scraper automatically normalizes different username formats and validates them against GitHub's requirements. Invalid usernames will be skipped with warning messages.
Input Parameters
-
Usernames (required):
- Array of GitHub usernames to scrape
- Supported formats:
username,@username,github.com/username,https://github.com/username - Username requirements: Must follow GitHub's username rules (alphanumeric characters and hyphens, no consecutive hyphens, cannot start/end with hyphen, max 39 characters)
- Invalid usernames will be automatically filtered out with warnings
-
Max Threads (optional):
- Number of concurrent threads for scraping (1-20)
- Default: 5
- Higher values = faster processing but may increase chance of rate limiting
-
Proxy Configuration (recommended):
- Enable Apify proxy to avoid rate limiting
- Recommended for bulk scraping operations
π€ Output Format
Each GitHub profile returns structured data such as:
{"username":"johndeveloper","status":"success","name":"John Developer","bio":"Full-stack developer passionate about open source","location":"San Francisco, CA","email":"john@example.com","website":"https://johndeveloper.dev","twitter":"john_codes","followers":"1234","following":"456","repos_count":"42","contribs":"567 contributions in the last year","pinnedrepos":[{"name":"awesome-project","url":"https://github.com/johndeveloper/awesome-project","desc":"An innovative web application framework","lang":"JavaScript","stars":"2,500","forks":"320"}],"repos":[{"url":"https://github.com/johndeveloper/web-framework","name":"web-framework","desc":"Modern web development framework","stars":"1850","forks":"210","languages":[{"lang":"JavaScript","percent":"78.2%"},{"lang":"TypeScript","percent":"18.5%"}]}],"starred_repos_list":[{"url":"https://github.com/example-org/popular-tool","name":"popular-tool"}],"contrib_matrix":[{"date":"2024-01-01","count":"3","level":"1"}]}
Error Handling
Failed profiles return structured error information:
{"username":"nonexistent-user","status":"not_found","message":"User not found"}
Common Error Cases:
not_foundβ User doesn't exist or profile is privateerrorβ Network issues or scraping errors- Invalid usernames are filtered out before processing with warning logs
πΌ Common Use Cases
Recruitment & Talent Sourcing
- Research developer profiles and technical expertise
- Analyze contribution patterns and project involvement
- Build comprehensive talent pipelines with GitHub activity data
- Assess coding skills through repository analysis
Developer Research & Analysis
- Study open source community members and contributors
- Analyze technology trends through developer profiles
- Research competitor team structures and technical expertise
- Track developer career progression and project involvement
Lead Generation & Business Development
- Extract contact information for developer outreach
- Build databases of potential customers in tech sectors
- Identify decision-makers in technology companies
- Enrich existing contact databases with GitHub profiles
Community Building & Networking
- Find developers with specific skills or interests
- Build communities around particular technologies
- Identify potential collaborators for open source projects
- Research conference speakers and industry experts
π Output & Export Options
Dataset Storage
- All extracted data stored in Apify dataset
- Each profile becomes one dataset item
- Status tracking for successful and failed extractions
Export Formats
- JSON β Raw structured data for API integration
- CSV β Spreadsheet-compatible format for analysis
- Excel β Formatted spreadsheet with profile data
Data Processing
- Clean, validated usernames
- Structured error reporting
- Comprehensive logging for troubleshooting
β‘ Quick Start Guide
-
Configure Input:
- Add GitHub usernames to the
usernamesarray - Set desired
max_threads(recommended: 5-10) - Enable proxy configuration for reliable scraping
- Add GitHub usernames to the
-
Run the Actor:
- Execute through Apify Console or API
- Monitor progress through real-time logs
- Review extracted data in the dataset
-
Export Results:
- Download data in your preferred format
- Integrate with your existing tools and workflows
π‘οΈ Privacy & Compliance
- Public Data Only β Extracts only publicly visible profile information
- Respects Privacy Settings β Email extraction only works for publicly visible emails
- Rate Limiting β Built-in delays and proxy support to respect GitHub's terms
- Error Handling β Graceful handling of private or restricted profiles
π§ Technical Details
Built With
- Python & BeautifulSoup β Efficient HTML parsing and data extraction
- Apify SDK β Robust actor framework with built-in storage and proxy support
- Multi-threading β Concurrent processing for improved performance
- Request Handling β Smart retry mechanisms and error recovery
Performance
- Process hundreds of profiles per run
- Configurable concurrency for optimal speed
- Proxy rotation for reliable access
- Comprehensive error logging and recovery
π Example Results
Successful Profile Extraction
{"username":"jane-coder","status":"success","name":"Jane Smith","bio":"Frontend developer specializing in React and TypeScript. Open source enthusiast.","location":"Austin, TX","email":null,"website":"https://jane-codes.dev","followers":"3456","following":"234","repos_count":"87","pinnedrepos":[{"name":"react-toolkit","desc":"Comprehensive React development toolkit","stars":"8500","lang":"TypeScript"}]}
π‘ Tips for Best Results
- Enable Proxies β Use Apify proxy configuration for reliable large-scale scraping
- Username Format β Ensure usernames follow GitHub's format rules:
- Only alphanumeric characters and hyphens allowed
- Cannot start or end with a hyphen
- No consecutive hyphens (e.g.,
user--nameis invalid) - Maximum 39 characters
- Invalid usernames will be skipped with warnings
- Monitor Rate Limits β Use appropriate thread counts to avoid GitHub rate limiting
- Handle Private Profiles β Some data may not be available for users with privacy settings
- Email Availability β Email extraction only works for publicly visible emails (most users keep emails private)
π Support & Feedback
For questions, feature requests, or technical support:
- Visit the Apify Community Forum
- Contact us through the Apify platform
- Submit issues for improvements and bug reports
π Explore More Actors
β¨ Need more scraping solutions? Discover additional actors on Apify for comprehensive web automation and data extraction. Explore our full range of tools at π Explore More Actors on Apify.
π§ For inquiries or custom development, reach out at apify@vulnv.com.
