GitHub Profile โ Repos, Stars, Activity, CSV, No Token, Bulk
Pricing
Pay per usage
GitHub Profile โ Repos, Stars, Activity, CSV, No Token, Bulk
21 runs. GitHub user intel in CSV/JSON โ repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com ยท blog.spinov.online ยท t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
6
Total users
1
Monthly active users
2 months ago
Last modified
Categories
Share
GitHub Profile Scraper โ Users, Repos, Stars & Language Stats
Pull GitHub user profiles plus their public repositories โ bio, follower counts, top repos by stars, language distribution, license info, activity timestamps โ using the official GitHub REST API. No token required for public-data lookups (60 req/h unauthenticated rate-limit applies; the actor handles that automatically).
What you get per user
The actor pushes one record per username. All fields below come from GET /users/{username} and GET /users/{username}/repos โ verified against src/main.js.
Profile-level fields (19, including scrapedAt metadata)
| Field | Type | Source |
|---|---|---|
username | string | user.login |
name | string | user.name |
bio | string | user.bio |
company | string | user.company |
location | string | user.location |
email | string | user.email (when public) |
blog | string | user.blog |
twitterUsername | string | user.twitter_username |
avatar | string (URL) | user.avatar_url |
profileUrl | string (URL) | user.html_url |
publicRepos | int | user.public_repos |
publicGists | int | user.public_gists |
followers | int | user.followers |
following | int | user.following |
createdAt | string (ISO 8601) | user.created_at |
updatedAt | string (ISO 8601) | user.updated_at |
hireable | bool | user.hireable |
type | string | User or Organization |
scrapedAt | string (ISO 8601) | run timestamp |
Repo-level fields (15 per repo, when includeRepos=true)
name, fullName, description, url, stars, forks, watchers, language, topics[], isForked, createdAt, updatedAt, pushedAt, license, openIssues
Plus aggregates on the parent profile record (computed from the EXTRACTED repos only โ see Honest limitations):
totalStarsโ sum across the extracted repos (capped bymaxReposPerUser, NOT total stars across all of the user's repos)totalForksโ sum across the extracted repos (same caveat)languagesโ array of{language, repoCount}, sorted desc by repo count.repoCountis "how many of the extracted repos have this as theirlanguagefield" โ this is NOT GitHub's bytes-weighted language graph; for that you need the/repos/{owner}/{repo}/languagesendpoint per-repo (custom build available).
Output example
{"username":"torvalds","name":"Linus Torvalds","bio":null,"company":"Linux Foundation","location":"Portland, OR","email":null,"blog":"","twitterUsername":null,"avatar":"https://avatars.githubusercontent.com/u/1024025?v=4","profileUrl":"https://github.com/torvalds","publicRepos":7,"publicGists":0,"followers":220000,"following":0,"createdAt":"2011-09-03T15:26:22Z","updatedAt":"2026-04-20T10:11:12Z","hireable":null,"type":"User","totalStars":185000,"totalForks":55000,"languages":[{"language":"C","repoCount":4},{"language":"Shell","repoCount":1}],"repos":[{"name":"linux","fullName":"torvalds/linux","description":"Linux kernel source tree","url":"https://github.com/torvalds/linux","stars":175000,"forks":53000,"watchers":175000,"language":"C","topics":["linux","kernel"],"isForked":false,"createdAt":"2011-09-04T22:48:12Z","updatedAt":"2026-04-29T08:00:00Z","pushedAt":"2026-04-29T07:55:00Z","license":"GPL-2.0","openIssues":1}],"scrapedAt":"2026-04-29T12:00:00.000Z"}
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
usernames | array | [] | GitHub usernames (e.g. ["torvalds", "gaearon"]) |
includeRepos | boolean | true | If false, profile-only โ skip repo + language extraction |
maxReposPerUser | integer | 30 | Cap per user, sorted by stars desc (1โ100; the GitHub API page-size ceiling is 100, the actor enforces it) |
includeLanguageStats | boolean | true | Aggregate language distribution across the extracted repos |
Use cases
- Developer recruiting โ evaluate candidates by their open-source footprint, language mix, and active-repo cadence (
pushedAt). - Competitor analysis โ fingerprint the OSS strategy of a company by mapping its top contributors' top repos.
- Community research โ identify influential developers in a technology ecosystem (high stars + matching
topics). - Portfolio benchmarking โ compare repo activity, fork ratios, and license distributions across a candidate set.
- Lead generation โ surface developers with specific technology expertise (filter by
languages[*].language).
How it works
GET /users/{username}โ 18 fields from GitHub +scrapedAt(19 total).- (If
includeRepos)GET /users/{username}/repos?sort=stars&direction=desc&per_page=<maxReposPerUser>โ 15 fields per repo. Single page only โ no pagination beyond GitHub's per-page ceiling (100). - Aggregate
totalStars,totalForks, andlanguagesfrom the extracted repo set (capped bymaxReposPerUser). - On HTTP 403 with
X-RateLimit-Remaining: 0, the actor readsX-RateLimit-Reset, sleeps until the window resets (+5 s buffer), then retries. No manual handling required, but unauthenticated mode caps you at 60 req/h โ set up an authenticated proxy actor (custom build) if you need higher throughput.
500 ms delay between user-profile fetch and repo-list fetch; 1 s delay between users.
Honest limitations (read before bulk runs)
- No pagination beyond
maxReposPerUser(GitHubper_pageceiling = 100). For users with hundreds of repos, only the top-N by stars are returned. If you need ALL repos, request a paginated custom build. totalStars/totalForksare EXTRACTED-repo aggregates, NOT lifetime totals. A user with 200 repos andmaxReposPerUser=30will showtotalStarssummed over the top 30 only โ not all 200.languagesisrepoCount, NOT bytes-weighted. GitHub's UI shows percent of repo bytes per language; this actor counts how many repos list each language as theirlanguagefield. Bytes-based language stats require per-repo/languagescalls (custom build).- One user's HTTP error halts the entire batch. The
for (const username of usernames)loop is wrapped in ONE outertry/catch; a 502 / 503 on user #5 of 100 stops the run. Workaround: split large batches into โค25-username runs, or request a per-user try/catch custom build. - Rate-limit retry has no max-wait safeguard. If
X-RateLimit-Resetshows 60 minutes ahead, the actor sleeps 60 minutes (+5 s) before retrying. For a 100-user batch hitting the unauthenticated cap, total wall-clock can stretch to 1.5โ2 h. For high-volume runs, request an authenticated custom build (5000 req/h with PAT). - No proxy. Direct fetch from the Apify worker IP. GitHub's unauthenticated rate-limit is per-IP, so co-tenant noise on the same Apify IP can shrink your effective quota.
emailisnullfor most users. GitHub only exposesemailwhen the user has explicitly set a public email in profile settings.licensemay benullfor repos without a declared license, or repos using a license GitHub can't classify (returnsspdx_idonly, ornull).hireableis tri-state (true/false/nullโ GitHub-account opt-in). README example showsnull(the most common state).topicsordering is GitHub-internal โ not alphabetical, not by relevance; just the array as returned.- Hardcoded UA
ApifyGitHubProfileScraper/1.0. GitHub recommends a UA but doesn't inspect it for capability gating. - Empty
usernames = []is silently accepted โ actor logsNo usernames provided.and exits 0.
Python integration
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/github-profile-scraper").call(run_input={"usernames":["torvalds","gaearon","sindresorhus"],"maxReposPerUser":20,})for p in client.dataset(run["defaultDatasetId"]).iterate_items():top_lang =(p.get("languages")or[{}])[0].get("language","?")print(f"{p['username']:>20}{p['followers']:>7,} followers {p.get('totalStars',0):>7,} stars top:{top_lang}")
GitHub data toolkit (related actors)
| Tool | What it does |
|---|---|
| GitHub Trending Scraper | Popular repos by language / time window |
| GitHub Profile Scraper | This โ developer profiles + repos |
| GitHub Issues Scraper | Issues and PRs for a repo |
All free to inspect on Apify Store โ 31 published actors, 78 total in portfolio.
Common questions
Q: Do I need a GitHub API token?
A: For public profiles, no โ the actor uses the unauthenticated REST API (60 req/h). With ~30 repos per user, that supports about 30 users per hour before the rate-limit kicks in. The actor handles X-RateLimit-Reset automatically; very large batches will simply pause and resume.
Q: Does this scrape stars given, contribution graph, or sponsors? A: No โ those are separate REST endpoints (and the contribution graph is HTML-only). Available as a custom build (see Custom scraping below).
Q: Why is email null even on public-figure accounts?
A: GitHub only exposes email when the user has explicitly set a public email in profile settings. For most profiles email is null. Use the blog field as a fallback contact-discovery signal.
Q: Are forked repos included?
A: Yes โ isForked: true flags them. Filter them out client-side if you want owned-only repos.
Custom scraping โ pilot tiers
Need authenticated batches, contribution-graph extraction, organization-wide audits, or a different schema (e.g. last-90-days commits per repo)? Three tiers:
- Pilot โ $97 ยท 1 actor, basic config, 7-day support. Good entry point โ useful for a single OSS-strategy report on one company's contributors.
- Standard โ $297 ยท custom actor + Slack/email alerts on results, 30-day support. Most recruiting and competitor-OSS projects fit here.
- Premium โ $797 ยท custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly contributor refresh, multi-org rollups, technology-trend tracking).
Email: spinov001@gmail.com โ drop the username list and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) โ Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai ยท blog.spinov.online
Disclaimer
Designed for recruiting research, OSS-strategy analysis, and academic use. Respect GitHub's Terms of Service, applicable data-protection law (GDPR, CCPA), and scrape publicly visible content only. Not affiliated with GitHub, Inc. or Microsoft Corporation.
Honest disclosure: 19 profile fields (18 from GitHub + scrapedAt) + 15 repo fields per record. Unauthenticated rate-limit is 60 req/h; the actor handles X-RateLimit-Reset automatically (no max-wait safeguard โ long resets = long waits). totalStars/totalForks/languages aggregates are computed over the EXTRACTED repos only (capped by maxReposPerUser, max 100), NOT lifetime totals. languages.repoCount is repo count not bytes-weighted. Single API page, no pagination beyond 100 repos. One user's HTTP error halts the batch (outer try/catch). No contribution graph, no commit-activity series, no stars-given list, no sponsors data โ those are different endpoints and can be built as custom additions.
