VOOZH about

URL: https://apify.com/legend006/wikipedia-scraper

โ‡ฑ Wikipedia Scraper - Articles, Search & Recent Changes ยท Apify


๐Ÿ‘ Wikipedia Scraper - Articles, Search & Recent Changes avatar

Wikipedia Scraper - Articles, Search & Recent Changes

Pricing

from $0.10 / 1,000 results

Go to Apify Store

Wikipedia Scraper - Articles, Search & Recent Changes

Scrape Wikipedia articles by title, run keyword searches, pull recent changes, or extract entire categories โ€” across any of 300+ language editions. Returns clean text, summaries, references, links, and metadata. Built for AI/LLM training datasets, NLP research, and knowledge-graph building.

Pricing

from $0.10 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ NIJ KANANI

NIJ KANANI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

a month ago

Last modified

Share

๐Ÿ“š Wikipedia Scraper

Scrape Wikipedia articles, search results, recent edits, and categories โ€” across all 300+ language editions. Returns clean plain-text content, summaries, references, and rich metadata.

๐ŸŽฏ Built for AI/LLM training datasets, NLP research, knowledge-graph construction, journalism, and education.

๐Ÿ‘ Sample dataset output

๐Ÿ‘ Input form

๐Ÿ‘ Run log โ€” clean success


โœจ What you can do

  • ๐Ÿ“„ Fetch articles by title โ€” clean plain-text body, summary, sections, references
  • ๐Ÿ”Ž Search โ€” full-text search across an entire language edition
  • ๐Ÿ“ก Recent changes โ€” live feed of edits (title, user, comment, revid)
  • ๐Ÿ“ Pull entire categories โ€” all members of Category:Machine_learning, etc.
  • ๐ŸŒ Any language โ€” en, es, fr, de, ja, zh, hi, ar, etc.
  • ๐Ÿ“ฆ Rich output: links (internal+external), categories, sections, last-modified

๐Ÿš€ Quick start

{
"mode":"articles",
"language":"en",
"titles":["Artificial intelligence","Large language model"],
"includeContent":true,
"includeReferences":false
}

๐Ÿ“ฅ Input

FieldUsed in modeDescription
modeallarticles / search / recentchanges / category
languageallWiki edition code (en, de, ja...)
titlesarticlesArticle titles
searchQueriessearchKeywords or phrases
categorycategoryCategory name without Category: prefix
maxItemsallCap per query
includeContentarticles, search, categoryFull plain-text body
includeReferencesarticles, search, categoryExternal + internal links + sections

๐Ÿ“ค Output (per item)

{
"mode":"articles",
"title":"Artificial intelligence",
"language":"en",
"pageId":1164,
"summary":"Artificial intelligence (AI) refers to...",
"content":"Full article text...",
"wordCount":12873,
"sections":["Goals","History","Methods"],
"externalLinks":["https://..."],
"internalLinks":["Machine learning","Neural network"],
"categories":["Artificial intelligence","Cybernetics"],
"url":"https://en.wikipedia.org/wiki/Artificial_intelligence",
"lastModified":"2026-04-30T...",
"scrapedAt":"2026-05-06T..."
}

๐ŸŽฏ Use cases

WhoWhy
๐Ÿค– LLM teamsPretraining + fine-tuning datasets across languages
๐Ÿ“š NLP researchersMultilingual corpora, named-entity benchmarks
๐Ÿ“ฐ JournalistsTopic deep-dives + fact-checking pipelines
๐ŸŽ“ EducatorsAuto-build study material from any topic
๐Ÿง  Knowledge graphsWikipedia as an entity backbone

โš™๏ธ Tech notes

  • Uses MediaWiki's official Action API + REST Summary API
  • No login, no key, no rate limits (within fair use)
  • Plain-text extraction via explaintext=1 โ€” already cleaned, no HTML/wikitext
  • Recent-changes uses rctype=edit|new to skip log noise

โ“ FAQ

Are full Wikipedia dumps better? For one-shot pre-training, yes (free at dumps.wikimedia.org). This Actor is for targeted scrapes โ€” specific topics, ongoing freshness, multi-language slices, or recent-changes monitoring.

Schedule it? Yes. Recent changes mode is perfect for hourly Apify Schedules.

Hits rate limits? Almost never. MediaWiki's anonymous limit is generous and we add automatic retries with backoff.

You might also like

Wikipedia Scraper

solidcode/wikipedia-scraper

[๐Ÿ’ฐ $0.6 / 1K] Search Wikipedia or fetch exact articles by URL or title, and extract clean structured data โ€” summaries, full plain text, categories, 30-day pageviews, thumbnails, coordinates, and language counts โ€” across 300+ language editions.

Wikipedia Article Scraper

crawlerbros/wikipedia-scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

๐Ÿ“š Wikipedia Scraper โ€” Articles & Knowledge Data

nexgendata/wikipedia-scraper

Extract structured data from Wikipedia โ€” article text, infoboxes, categories, references & links. Build knowledge bases, AI training datasets & research tools. Pay per article.

Wikipedia Page Dataset Scraper

scrapeai/wikipedia-page-dataset-scraper

Scrape Wikipedia articles and export structured dataset fields for training, knowledge bases, and research.