VOOZH about

URL: https://apify.com/crawlerbros/wikipedia-scraper

โ‡ฑ Wikipedia Article Scraper ยท Apify


Pricing

from $0.50 / 1,000 results

Go to Apify Store

Wikipedia Article Scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

8

Total users

3

Monthly active users

2 months ago

Last modified

Share

Extract structured data from Wikipedia articles using the official MediaWiki API. Get article summaries, categories, images, metadata, and descriptions. Supports 300+ languages.

Features

  • Extract article titles, summaries, and descriptions
  • Get categories, images, and thumbnails
  • Support for 300+ Wikipedia languages
  • Two modes: scrape by URL or search by keyword
  • Uses official Wikipedia REST + MediaWiki APIs
  • No proxy or cookies required
  • Lightweight HTTP-only (no browser)
  • Proper rate limiting and User-Agent identification

Input

FieldTypeDefaultDescription
articleUrlsArrayโ€”Wikipedia article URLs to scrape
searchQueriesArrayโ€”Search terms to find articles
maxArticlesPerQueryInteger5Max articles per search query (1-50)
languageString"en"Wikipedia language code

Example: Scrape by URL

{
"articleUrls":[
"https://en.wikipedia.org/wiki/Python_(programming_language)",
"https://en.wikipedia.org/wiki/Artificial_intelligence"
]
}

Example: Search by Keyword

{
"searchQueries":["machine learning","quantum computing"],
"maxArticlesPerQuery":3,
"language":"en"
}

Output

FieldTypeDescription
titleStringArticle title
urlStringFull Wikipedia URL
summaryStringLead section extract (first few paragraphs)
descriptionStringWikidata short description
categoriesArrayArticle categories
thumbnailObjectThumbnail image with source, width, height
imagesArrayImage filenames from the article
lastModifiedStringLast edit timestamp
languageStringLanguage code
pageIdIntegerWikipedia page ID
scrapedAtStringISO timestamp when scraped

Use Cases

  • Research โ€” collect structured article data for academic or business research
  • Content enrichment โ€” augment your database with Wikipedia descriptions and metadata
  • Knowledge graphs โ€” build knowledge bases from Wikipedia's categorized data
  • Education โ€” gather article summaries for educational content
  • SEO โ€” analyze Wikipedia's coverage of topics in your niche
  • Data science โ€” use Wikipedia data for NLP training and analysis

FAQ

Is a proxy required?

No. Wikipedia's API is freely accessible. No proxy, cookies, or authentication needed.

What languages are supported?

All 300+ Wikipedia language editions. Set the language parameter to any valid code: en, fr, de, es, ja, zh, ru, pt, it, ar, ko, nl, pl, etc.

Are there rate limits?

Wikipedia asks for polite access with proper User-Agent headers. The scraper includes built-in delays (0.3-0.5s between requests) to respect Wikipedia's guidelines.

Can I scrape article content (full text)?

This scraper extracts the lead section summary. For full article text, the summary field contains a clean text extract of the opening paragraphs which is suitable for most use cases.

You might also like

Wikipedia Scraper

automation-lab/wikipedia-scraper

Search and extract Wikipedia articles โ€” titles, summaries, full content, categories, and images. Uses the free MediaWiki API.

๐Ÿ‘ User avatar

Stas Persiianenko

20

Wikipedia Scraper

gio21/wikipedia-scraper

Search Wikipedia and return article summaries or full text via the public REST API. Supports 300+ languages. Useful for knowledge extraction, research, content generation, and entity enrichment.

Wikipedia Article Scraper

rupom888/wikipedia-article-scraper

Scrape Wikipedia articles using the official MediaWiki REST API. Search by keyword, look up specific titles, or scrape by URL. Extracts full article text, sections, infobox data, categories, references, images, and related articles. Supports 300+ languages.

Related articles

How to scrape Wikipedia with Python
Read more