VOOZH about

URL: https://apify.com/jungle_synthesizer/cyclingnews-races-news-scraper

⇱ Cyclingnews Races & News Scraper Β· Apify


πŸ‘ Cyclingnews Races & News Scraper avatar

Cyclingnews Races & News Scraper

Under maintenance

Pricing

Pay per event

Go to Apify Store

Cyclingnews Races & News Scraper

Under maintenance

Scrapes pro-cycling news articles and race reports from Cyclingnews.com. Extracts headline, author, dates, body text, summary, and LATAM-cycling relevance flags (riders and races). For sports-analytics, LLM training, and cycling intelligence dashboards.

Pricing

Pay per event

Rating

0.0

(0)

Developer

πŸ‘ BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Categories

Share

Scrapes pro-cycling news articles and race reports from Cyclingnews.com β€” the largest English-language cycling news outlet, owned by Future plc. Returns structured article data including headline, author, publish date, full body text, and a curated LATAM-cycling relevance layer.

The site is server-rendered with rich JSON-LD structured data on every article. No browser required. The scraper pulls from the Google News sitemap and the live /news/ listing page, so each run returns the freshest content without you managing pagination or archives.

What It Returns

Every record is one article. The dataset includes:

FieldTypeDescription
article_idStringURL-slug identifier derived from the canonical URL
article_urlStringCanonical URL of the article
article_titleStringHeadline (HTML entities decoded)
article_authorStringPrimary author name
article_published_atStringISO-8601 publish timestamp
article_modified_atStringISO-8601 last-modified timestamp
article_body_textStringPlain-text article body, up to 50,000 characters
article_summaryStringSub-headline or deck
article_sectionStringSection label (e.g. Racing, Women's Cycling, Teams & Riders)
article_tagsArrayOpen Graph article:tag values
latam_relevantBooleanTrue if the article mentions a curated LATAM rider or race
latam_ridersArrayLATAM riders mentioned (Quintana, Bernal, Carapaz, Higuita, etc.)
latam_racesArrayLATAM races mentioned (Tour Colombia, Vuelta San Juan, etc.)
source_urlStringAlways https://www.cyclingnews.com
scraped_atStringISO-8601 scrape timestamp

LATAM Enrichment

The latam_relevant flag and companion arrays are the value-add. The scraper checks every article against a curated list of ~30 Colombian, Ecuadorian, and other Latin American riders β€” Nairo Quintana, Egan Bernal, Richard Carapaz, Sergio Higuita, Santiago Buitrago, and others β€” plus ~25 LATAM races including Tour Colombia, Vuelta a Colombia, Vuelta San Juan, and Ruta de los Conquistadores. Downstream models and dashboards can filter on latam_relevant: true without re-reading the body text.

Input Parameters

ParameterTypeDefaultDescription
maxItemsInteger10Maximum articles to scrape. The Google News sitemap refreshes every few hours with ~27 recent articles.

How It Works

Each run:

  1. Fetches sitemap-news.xml (Google News sitemap β€” always publicly accessible) and collects article URLs for the past 48–72 hours.
  2. Also scrapes the live /news/ listing page for any articles not yet indexed in the sitemap.
  3. Deduplicates and caps to maxItems, then fetches each article.
  4. Parses JSON-LD NewsArticle schema for structured metadata, #article-body for body text.

The scraper uses impit β€” a Chrome TLS fingerprint HTTP client β€” which passes Fastly CDN edge checks without a browser. No proxy required.

Use Cases

  • Sports-analytics pipelines: feed article bodies into NLP models to extract race results, rider performance signals, and team news.
  • LLM training corpora: Cyclingnews is the canonical English-language source for pro-cycling narrative. The body text is editorial-quality, structured, and tagged.
  • LATAM cycling intelligence dashboards: the latam_riders and latam_races arrays make it simple to track Colombian Grand Tour coverage, contract news, and race reports without keyword scanning.
  • Journalism aggregators: combine with a scheduling trigger to catch every article within hours of publication.

Coverage

Cyclingnews publishes 50–80 articles per week across racing, women's cycling, teams & riders, tech/gear, and features. The Google News sitemap covers the rolling 48-hour window β€” run on a daily or twice-daily schedule to maintain a complete archive. A single run with maxItems: 0 captures all available articles (~27 from the news sitemap plus the listing page).

Limitations

The Google News sitemap covers recent articles only (~48–72 hours). Historical article archives are not accessible without pagination, which Future plc gates with 403 on non-recent listing pages. For historical ingestion, supply a list of known article URLs via a custom pipeline.


Data sourced from Cyclingnews.com (Future plc). Use in accordance with applicable terms of service.

You might also like

USA Cycling Sport80 Events Scraper

jungle_synthesizer/usacycling-sport80-events-finder-scraper

Scrapes USA Cycling's Sport80 event locator for cycling race and event listings. Extracts organizer contact details (name, email, phone), event dates, location coordinates, pricing, entry windows, and capacity β€” ideal for B2B lead generation targeting race-service suppliers.

πŸ‘ User avatar

BowTiedRaccoon

2

Biketo China Cycling News & Product Scraper

jungle_synthesizer/biketo-china-cycling-news-product-scraper

Scrapes Biketo (美ιͺ‘网) β€” China's largest cycling portal β€” for news, product reviews, and race coverage since 2008. Enumerates articles by sequential ID across three channels. Returns title, author, publish date, channel, body text, lead image, and engagement metrics.

πŸ‘ User avatar

BowTiedRaccoon

2

Professional Cycling Results & Classifications

trovevault/professional-cycling-results-classifications

Returns cycling race winners, stage results, GC, points, mountains, youth, and team classifications. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Copaci Panamerican Cycling Confederation Scraper

jungle_synthesizer/copaci-panamerican-cycling-confederation-scraper

Scrape COPACI (Pan-American Cycling Confederation): 41 national federations, Pan-American race calendar, continental records, and multilingual news articles.

πŸ‘ User avatar

BowTiedRaccoon

2

BBC News Articles Scraper | UK and World Headlines

parseforge/bbc-news-articles-scraper

Collect BBC News articles with headline, author, date, section, summary, and full body text. Filter by topic, region, or keyword. Useful for media monitoring, sentiment analysis, NLP training datasets, and competitive intelligence across global news.

CBC Brasil Cycling Events & Federations Scraper

jungle_synthesizer/cbc-brasil-cycling-events-federations-scraper

Scrapes sanctioned cycling events and state federations from the ConfederaΓ§Γ£o Brasileira de Ciclismo (CBC). Returns event details, dates, locations, disciplines, and federation contact information.

πŸ‘ User avatar

BowTiedRaccoon

2

Yahoo News Scraper

piotrv1001/yahoo-news-scraper

Scrapes news articles from Yahoo News categories, extracting titles, authors, sources, publication dates, descriptions, images, and full article body text. Ideal for media monitoring, trend analysis, and news aggregation.

MarketWatch Scraper | Stock News and Financial Data

parseforge/marketwatch-scraper

Extract financial news, stock quotes, market data, and articles from MarketWatch with headline, author, ticker, body text, published date, summary, and category. Power finance dashboards, sentiment analysis, market intelligence, and investment research workflows.

CNN Articles Scraper | US and World News Headlines

parseforge/cnn-articles-scraper

Extract CNN articles with headline, byline, date, section, summary, and full body. Filter by topic, region, or keyword. Useful for media monitoring, sentiment analysis, NLP training datasets, and competitive intelligence across US and international news.

Procyclingstats.com Scraper

lexis-solutions/procyclingstats-com-scraper

Scrape race results, rider statistics, team info, and historical cycling data from ProCyclingStats.com. Ideal for analysts, journalists, team managers, and enthusiasts needing structured race standings, performance metrics, and career records. Fast, structured, and customizable extraction.

πŸ‘ User avatar

Lexis Solutions

16