VOOZH about

URL: https://apify.com/ericfox/college-football-roster-scraper-apify

⇱ College Football Roster Scraper Β· Apify


Pricing

Pay per usage

Go to Apify Store

College Football Roster Scraper

Scrape college football roster pages into clean player datasets. Extract names, jersey numbers, positions, class year, height, weight, hometown, profile URLs, and headshots from FCS/default URLs or custom roster links. Includes adapters for multiple athletics site formats.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Eric F

Eric F

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

College Football Roster Scraper for Apify

A production-oriented Apify Actor that scrapes public college football roster pages into normalized, player-only rows.

This upgraded version uses adapter-based extraction instead of a single generic selector pass. It is designed for the exact roster issues that came up during the FCS roster build: Sidearm card pages, Presto-style pages, header-table pages, outlier card layouts, duplicate mobile/desktop player cards, View Bio name noise, player names recoverable only from /roster/name/id URLs, and pages that should report diagnostics instead of silently returning zero rows.

What it extracts

Each player row is pushed to the default Apify dataset with fields like:

{
"scraped_at":"2026-06-22T00:00:00.000Z",
"sport":"football",
"season":"2025",
"team_name":"North Dakota State",
"roster_url":"https://gobison.com/sports/football/roster/2025",
"source_url":"https://gobison.com/sports/football/roster/2025",
"source_platform":"sidearm+table+generic-card",
"player_profile_url":"https://gobison.com/sports/football/roster/example-player/12345",
"headshot_url":"https://...jpg",
"headshot_confidence":"high",
"first_name":"Example",
"last_name":"Player",
"full_name":"Example Player",
"jersey_number":"12",
"position":"QB",
"height":"6'2",
"height_inches":74,
"weight":"205",
"class_year":"JR",
"hometown":"Lewes, Del.",
"high_school":"Cape Henlopen",
"previous_school":"",
"extraction_method":"sidearm_card"
}

Included adapters

The Actor runs these adapters in auto mode:

  1. Sidearm adapter

    • Targets .sidearm-roster-player and related roster-card classes.
    • Handles duplicate card layouts and View Bio/Full Bio noise.
    • Recovers names from profile URL slugs when the visible link text is useless.
  2. Presto-style adapter

    • Targets common Presto roster/card wrappers and falls back to the table parser.
    • Useful for smaller-school athletics sites with less consistent markup.
  3. JSON-state adapter

    • Scans valid JSON data in application/ld+json, __NEXT_DATA__, and state-like script blobs.
    • Extracts player records only when the object has roster-like evidence such as position, jersey, class, height, or weight.
  4. Header table adapter

    • Uses header names rather than fixed cell indexes.
    • This avoids the earlier failure mode where cells[0], cells[1], etc. misassigned jersey, height, class, and weight on outlier tables.
  5. Heuristic table/card fallback

    • Attempts a final extraction pass for pages with no obvious platform markers.
    • Uses profile links, position/height/weight/class patterns, and player-only filtering.

Player-only behavior

The extractor attempts to avoid coaches/staff by:

  • preferring /sports/football/roster/ or /roster/ profile links
  • excluding /coach/ and /coaches/ links
  • excluding cards/rows with staff terms such as coach, coordinator, assistant, trainer, operations, analyst, recruiting, staff, etc.
  • requiring roster-like evidence such as position, height, weight, class year, jersey number, or a roster profile URL

This is a data-cleaning filter, not a legal/compliance filter.

Included default/demo dataset

The bundled default list lives here:

src/default-fcs-roster-urls.js

The input option useDefaultFcsUrls is true by default. To avoid accidentally crawling the full list during testing, maxRosterUrls defaults to 10. Set maxRosterUrls to 0 to crawl every bundled URL.

Run locally

Install Apify CLI first if you have not already:

npminstall-g apify-cli
apify login

Then run:

npminstall
npm run check
npm run test:fixtures
apify run -p sample-input.json

Local dataset output will appear under:

storage/datasets/default/

Run summaries are saved to the default key-value store:

RUN_SUMMARY
ZERO_PLAYER_PAGES

Deploy to Apify

From the project folder:

$apify push

Then open the Actor in Apify Console and run it with the default input.

Suggested first tests

Start with 3 to 5 pages:

{
"useDefaultFcsUrls":true,
"season":"2025",
"maxRosterUrls":5,
"maxConcurrency":3,
"startUrls":[]
}

Then test one custom roster URL:

{
"useDefaultFcsUrls":false,
"season":"2025",
"maxRosterUrls":0,
"startUrls":[
{
"url":"https://gobison.com/sports/football/roster/2025",
"userData":{
"team_name":"North Dakota State"
}
}
]
}

Debugging outlier pages

If a roster URL returns no players, check the key-value store record:

ZERO_PLAYER_PAGES

It includes:

  • page URL
  • team name
  • detected source platform
  • page title and h1
  • table count
  • image count
  • roster link count
  • Sidearm card count
  • per-adapter row counts/errors
  • a short body-text sample

You can also enable emitDiagnosticRows in input to push a visible diagnostic row into the dataset, but keep it disabled for clean production exports.

Commercial / Apify Store notes

For a public Apify Store listing, position it as a normalized public roster data extractor, not as a copyrighted media downloader. The Actor returns image URLs only; it does not download or rehost headshot images.

Recommended store copy:

Scrape public college football roster pages into clean CSV/JSON player rows, including names, jersey numbers, positions, height, weight, class year, hometown, profile URLs, and headshot URLs. Built for FCS and college athletics roster workflows.

Practical limits

College athletics sites are not perfectly standardized. This Actor now has a real adapter layer, but a handful of domains may still need school/domain-specific micro-adapters after you see live failure diagnostics. The intended workflow is:

  1. Run a small sample.
  2. Inspect ZERO_PLAYER_PAGES.
  3. Add a domain adapter only for the pages that still fail.
  4. Re-run the full default FCS list.

You might also like

NHL Team Roster and Schedule Scraper

parseforge/nhl-roster-schedule-scraper

Pull NHL team roster and schedule data by tricode for all 32 franchises from Boston Bruins to Utah Hockey Club. Pick a mode for roster, schedule, or both. Useful for hockey fan sites, fantasy hockey tools, season recap builds, and tracking player movement across the league.

ESPN College Football Scraper

parseforge/espn-college-football-scraper

Tap ESPN sub endpoints for college football scoreboard games, teams, or news. Add an optional YYYYMMDD date to scope the scoreboard. Handy for NCAA football trackers, conference standings dashboards, fantasy tools, and editorial workflows that surface daily gridiron results.

NCAA API - College Sports

alizarin_refrigerator-owner/ncaa-api---college-sports

Fetch comprehensive NCAA college sports data including basketball rankings, football standings, team rosters, player statistics, and game schedules for all divisions. Basketball Data & Football Data Teams Rankings Schedule Scores Standings

FOOTBALL API DATA

macheta/football-super-fast-data

ALL FOOTBALL DATA SUPER FAST AND REALTIME

College Email Scraper

contacts-api/college-s-email-scraper

College email scraper to extract verified emails from colleges, universities, and educational directories πŸ“§πŸŽ“ Perfect for outreach, partnerships, and education sector lead generation.

14

1.0

ESPN Football News Scraper

deloni/espn-football-news-scraper

Track football stats, updates, transfers, scores, and breaking news with the ESPN Football News Scraper. This actor is built to automate the extraction of football-related content from ESPN, including article titles, content, and images, ensuring you stay updated with the latest in football.

College Recruiter Job Scraper πŸ”πŸ’ΌπŸŽ“- Cheap

scrapestorm/college-recruiter-job-scraper---cheap

πŸ” Easily extract College Recruiter job listings Collect structured job data from College Recruiter, including job titles, company names, locations, summaries, posting dates and more πŸŽ“πŸ’Ό Ideal for student & graduate job search analysis, lead generation, and early-career hiring insights πŸ“ŠπŸš€

4

LinkedIn Company Employees Scraper

apt_marble/linkedin-company-employees-scraper

Map the workforce of any LinkedIn company. Get a clean roster of names, titles, regions, and profile URLs β€” either through public Google discovery (no login) or via your LinkedIn session cookie for the complete current employee list.