VOOZH about

URL: https://apify.com/lukaskrivka/article-extractor-smart/api

⇱ Scrape and download articles and news API Β· Apify


Pricing

Pay per usage

Go to Apify Store

Smart Article Extractor

πŸ“° Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

Pricing

Pay per usage

Rating

4.1

(9)

Developer

πŸ‘ LukΓ‘Ε‘ KΕ™ivka

LukΓ‘Ε‘ KΕ™ivka

Maintained by Apify

Actor stats

189

Bookmarked

7.6K

Total users

431

Monthly active users

18 days ago

Last modified

Categories

Share

You might also like

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Articles Extractor

web.harvester/articles-extractor

The Article Extractor is an enterprise-grade web scraping solution designed specifically for extracting structured data from news articles, blog posts, and online publications. Our advanced HTML parsing engine delivers unmatched accuracy in content extraction across thousands of websites.

753

5.0

Article Extractor & News Scraper

web.harvester/article-extractor-news-scraper

Extract articles from any news site, blog, or webpage. Get title, full text, author, date, images & metadata using 7 extraction engines (Newspaper4k, Trafilatura, Goose3). Anti-bot bypass, proxy rotation, automatic fallback. Perfect for news monitoring, NLP datasets & content aggregation.

50

5.0

Article Extraction API

tugelbay/article-extractor

Extract clean article text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

πŸ‘ User avatar

Tugelbay Konabayev

43

News & Article Extractor

automation-lab/news-article-extractor

Auto-discover news/blog articles and extract clean text plus Markdown for LLM/RAG corpora. Uses RSS, sitemaps, and Readability; outputs metadata, counts, and token estimates.

πŸ‘ User avatar

Stas Persiianenko

23

Fast News Scraper

timgreen/fast-news-scraper

Extract full article text and metadata from popular news sites like The New York Times, AP News, Reuters, CNBC, NPR, and Wired. Scrape thousands of articles in just a few minutes.

543

1.0

Google News Scraper

lhotanova/google-news-scraper

Gets featured articles from Google News with title, link, source, publication date and image.

πŸ‘ User avatar

KristΓ½na LhoΕ₯anovΓ‘

3.1K

4.6

Free Google News API β€” Search News by Keyword + Country

s-r/google-news

Free Google News scraper β€” get clean structured news results for any query, country, and language. Use it as a Google News API for brand monitoring, topic alerts, news clipping, and bulk article URL harvesting.

Google News Scraper (Pay Per Event)

data_xplorer/google-news-scraper-fast

Scrape Google News in real time, including images and descriptions. This tool extracts complete structured data: high-resolution visuals, full, titles, sources, dates, and direct URLs.

1.1K

4.8

Send DM for Linkedin

addeus/send-dm

Send personalized Direct Messages (DMs) to your LinkedIn connections in bulk. Supports variables like {firstName} for customization. Features randomized delays and proxy support to ensure account safety. Best for 1st-degree outreach.

You can access the Smart Article Extractor programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

πŸ‘ Python

Python

πŸ‘ JavaScript

JavaScript

CLI

πŸ‘ OpenAPI

OpenAPI

HTTP

MCP

# Set API token
$API_TOKEN=<YOUR_API_TOKEN>
# Prepare Actor input
$cat> input.json <<'EOF'
<{
< "startUrls": [
< {
< "url": "https://www.theguardian.com"
< }
< ],
< "isUrlArticleDefinition": {
< "minDashes": 4,
< "hasDate": true,
< "linkIncludes": [
< "article",
< "storyid",
< "?p=",
< "id=",
< "/fpss/track",
< ".html",
< "/content/"
< ]
< },
< "proxyConfiguration": {
< "useApifyProxy": true
< },
< "extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}"
<}
<EOF
# Run the Actor using an HTTP API
# See the full API reference at https://docs.apify.com/api/v2
$curl"https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/runs?token=$API_TOKEN"\
<-X POST \
<-d @input.json \
<-H'Content-Type: application/json'

Scrape and download articles and news API

Below, you can find a list of relevant HTTP API endpoints for calling the Smart Article Extractor Actor. For this, you’ll need an Apify account. Replace <YOUR_API_TOKEN> in the URLs with your Apify API token, which you can find under Integrations in Apify Console. For details, see the API reference.

Run Actor

POST
https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/runs?token=<YOUR_API_TOKEN>

Note: By adding the method=POST query parameter, this API endpoint can be called using a GET request and thus used in third-party webhooks. Please refer to our Run Actor API documentation.

Run Actor synchronously and get dataset items

POST
https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync-get-dataset-items?token=<YOUR_API_TOKEN>

Note: This endpoint supports both POST and GET request methods. However, only the POST method allows you to pass input data. For more information, please refer to our Run Actor synchronously and get dataset items API documentation.

Get Actor

GET
https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart?token=<YOUR_API_TOKEN>

For more information, please refer to our Get Actor API documentation.

Actors can be used to scrape web pages, extract data, or automate browser tasks. Use the Smart Article Extractor API programmatically via the Apify API.

You can choose from:

Smart Article Extractor API in Python

Smart Article Extractor API in JavaScript

Smart Article Extractor API through CLI

Smart Article Extractor OpenAPI definition

You can start Smart Article Extractor with the Apify API by sending an HTTP POST request to the Run Actorendpoint. An Actor’s input and its content type can be passed as a payload of the POST request, and additional options can be specified using URL query parameters. The Smart Article Extractor is identified within the API by its ID, which is the creator’s username and the name of the Actor.

When the Smart Article Extractor run finishes you can list the data from its default dataset(storage) via the API or you can preview the data directly on Apify Console.