VOOZH about

URL: https://apify.com/sync-network/in-depth-news-scraper

⇱ In Depth News Scraper Β· Apify


Pricing

from $2.50 / 1,000 results

Go to Apify Store

In Depth News Scraper

Extract full length articles from top news sources, streamlining the collection of the latest updates on any subject. Its key feature is retrieving complete contentβ€”not just headlines. Customise your output from concise summaries to complete articles, transforming your news gathering process.

Pricing

from $2.50 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Alam

Alam

Maintained by Community

Actor stats

2

Bookmarked

15

Total users

2

Monthly active users

2 months ago

Last modified

Share

In-Depth News Scraper

The In-Depth News Scraper is an Apify actor designed to revolutionise how you gather and process news data. It stands apart from conventional scrapers by delivering complete article content rather than just headlines, enabling comprehensive analysis across diverse news categories.

Key Advantages

β€’ Thorough content extraction, not just headlines β€’ Support for major news categories and outlets β€’ Flexible search and filtering capabilities β€’ Structured, analysis-ready output

Features

β€’ Category-Based Filtering: Focus your news gathering by targeting specific categories such as World, Business, or Technology. β€’ Complete Article Extraction: Access full article content directly, surpassing the limitations of basic news aggregators. β€’ Customisable Content Length: Control output size by specifying word count or retrieving complete articles. β€’ Intelligent Filtering: Exclude irrelevant content using customisable keyword filters. β€’ Time-Range Selection: Gather current news or research historical content with flexible time frame options. β€’ Structured Data Output: Receive consistently formatted data including titles, URLs, dates, and sources. β€’ Optional Image Support: Choose whether to include article images based on your requirements.

Input Parameters

The actor accepts the following configuration options:

ParameterTypeDescription
newsCategoryStringRequired: Category filter (e.g., "World", "Technology")
additionalKeywordsStringOptional: Refine search within selected category
numberOfItemsNumberNumber of articles to retrieve (default: 10, max: 100)
filterBadKeywordsArrayOptional: Keywords to exclude from results
contentLengthStringContent extraction mode: "Full" or "Summary" (default: Full)
timeRangeStringTime period for article selection
retrieveImageBooleanInclude image URLs in output (default: false)

Example configuration:

{
"newsCategory":"Technology",
"additionalKeywords":"artificial intelligence",
"numberOfItems":20,
"filterBadKeywords":["sponsored","advertisement"],
"contentLength":"Full",
"timeRange":"Past week",
"retrieveImage":false
}

Supported Categories

The actor provides coverage across these primary news categories:

  • World
  • Business
  • Technology
  • Entertainment
  • Health
  • Science
  • Sports
  • Politics

Output Structure

Each article in the dataset contains the following fields:

{
"title":"Article headline",
"link":"Article URL",
"pubDate":"2025-02-05T10:00:00.000Z",
"source":"Publishing outlet name",
"summary":"Brief article overview",
"content":"Full article text (length based on contentLength parameter)",
"imageUrl":"Main image URL (if retrieveImage is true)"
}

Implementation Guide

  1. Choose your target news category
  2. Add any specific keywords to refine results
  3. Set additional parameters as needed
  4. Execute the actor
  5. Access your structured dataset

Performance Considerations

Performance varies based on several factors:

  • Processing Duration: Typically 5-10 seconds per article for full extraction
  • Volume Handling: Efficiently processes up to 100 articles per run
  • Request Management: Sequential processing with appropriate intervals

For optimal results:

  • Limit requests to 50 items for faster completion
  • Use precise keywords to target relevant content
  • Consider using word limits unless full text is required
  • Disable image retrieval when not essential

Note: Network conditions and source website responsiveness may affect performance.

Error Handling and Troubleshooting

The actor implements comprehensive error handling:

  • Connection Issues: Automatic retry (up to 3 attempts) for failed connections
  • Rate Management: Dynamic delays between requests to prevent rate limiting
  • Content Fallback: Defaults to article summary if full content extraction fails
  • Input Validation: Clear error messages for invalid configurations

Troubleshooting Common Issues

  • Timeout Errors: Consider reducing batch size or increasing time between requests
  • Missing Content: Check if the source website requires authentication
  • Rate Limiting: The actor will automatically pause and retry; no action needed
  • Error Logs: Available in the actor's run details for debugging

For detailed error information, consult the actor's run log in the Apify Console.

Technical Support

For implementation assistance or to report issues:

  1. Check the actor's run log for specific error messages
  2. Review the troubleshooting section above
  3. Contact support with the actor run ID for detailed investigation

The actor continuously logs its progress and any errors encountered, facilitating quick problem resolution.

You might also like

Awesome Google News Scraper

sync-network/awesome-google-news-scraper

This tool scrapes content from Google News, streamlining the collection of latest the information on any topic. Its key feature is the ability to extract full-length articles, not just headlines. Customize results from brief summaries to complete content, revolutionizing your news gathering process.

Google News Scraper β€” Headlines, Articles & News Data

oneary/google-news-scraper

Extract the latest Google News articles by keyword. Get headlines, publishers, snippets, publish dates, and article URLs. Perfect for media monitoring, news aggregation, and trend tracking.

Awesome Crypto News Scraper

sync-network/awesome-crypto-news-scraper

Scrapes and aggregates crypto news from Google, allowing you to collect, filter, and analyse the latest news on specific cryptocurrencies or blockchain topics. It retrieves full-length articles, not just headlines, and lets you customize results from brief summaries to detailed content.

Ultimate News API

glitch_404/Ultimate-News-Scraper

Scrape up to 10000 news articles from over 4500 news sources in less than 20 minutes, news from over 20 categories, e.g., Crypto news, World News, Latest News, Celebrities, and a lot more. You can find news on websites such as Fox News, BBC News, CNN, and Cryptocurrency-Related News Sources.

257

1.0

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Google News Scraper

piotrv1001/google-news-scraper

Scrapes news articles from Google News, extracting titles, sources, publication dates, and links. Search by keywords, browse by topic, or get top headlines with multi-language and region support. Ideal for news monitoring, media analysis, and content aggregation.

Google News Scraper - Articles by Keyword & Topic

fascinating_lentil/google-news-scraper

Scrape Google News articles by search keyword, topic, or top headlines. Extract titles, sources, links, publish dates, and snippets. No login or API key needed.

πŸ‘ User avatar

Md Jakaria Mirza

2