VOOZH about

URL: https://apify.com/crawlerbros/slashdot

โ‡ฑ SlashDot Crawler ยท Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Extract comprehensive data from SlashDot.org, the premier technology news aggregator. This actor scrapes detailed article content, author information, publication dates, comment counts, popularity indicators, source links, and department tags from SlashDot's main sections.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(2)

Developer

๐Ÿ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

19 days ago

Last modified

Share

SlashDot Technology News Scraper

This Apify actor scrapes technology news articles from SlashDot.org, extracting comprehensive information about articles, their content, engagement metrics, and community discussions.

Features

  • Comprehensive Article Data: Scrapes detailed information about technology news articles
  • Content Analysis: Extracts full article content, summaries, and metadata
  • Engagement Metrics: Collects comment counts, scores, views, and ratings
  • Community Features: Gathers comments, discussions, and user interactions
  • Categorization: Extracts sections, tags, and topic classifications
  • Related Content: Finds related articles and cross-references
  • Filtering Options: Supports filtering by sections and sorting methods
  • HTML Debugging: Saves HTML content for selector analysis during development

Input Parameters

ParameterTypeDefaultDescription
maxArticlesInteger100Maximum number of articles to scrape
scrapeDetailsBooleantrueWhether to scrape detailed article pages
sectionsArray[]List of sections to filter by
sortByString"latest"Sort method (latest, popular, most_commented)

Output Data

Each article record includes:

Basic Information

  • article_id: Unique article identifier
  • title: Article title
  • summary: Article summary/teaser
  • url: URL to the full article
  • image_url: Article thumbnail/preview image URL

Author and Publication

  • author: Article author name
  • published_date: When the article was published
  • section: Article section/category

Categorization

  • tags: Array of tags and labels

Engagement Metrics

  • comment_count: Number of comments
  • score: Article score/rating
  • views: Number of views

Timestamps

  • scraped_at: When the data was scraped

Detailed Information (if scrapeDetails=true)

  • full_content: Complete article content
  • paragraphs: Array of article paragraphs
  • related_articles: Array of related articles with title and URL
  • comments: Array of comments with text, author, date, and score
  • media_files: Array of media files with URL, type, and alt text
  • source_links: Array of external source links
  • metadata: Article metadata from meta tags

Metadata

  • source: Source website (slashdot.org)

Usage Examples

Basic Usage

{
"maxArticles":50,
"scrapeDetails":true
}

Filtered by Section

{
"maxArticles":200,
"scrapeDetails":true,
"sections":["technology","science"],
"sortBy":"popular"
}

Most Commented Articles

{
"maxArticles":100,
"scrapeDetails":true,
"sortBy":"most_commented"
}

Quick Scraping (No Details)

{
"maxArticles":500,
"scrapeDetails":false,
"sortBy":"latest"
}

Development Features

HTML Debugging

During development, the scraper saves HTML content to the key-value store for selector analysis:

  • debug_slashdot_html: Contains the HTML content of the main page

Error Handling

  • Comprehensive error handling with detailed logging
  • Graceful handling of missing elements
  • Retry logic for failed requests

Browser Automation

  • Uses Playwright for reliable browser automation
  • Handles dynamic content loading
  • Implements proper delays and waits

Installation

  1. Install dependencies:
$pip install-r requirements.txt
  1. Install Playwright browsers:
$playwright install chromium
  1. Run the scraper:
$python -m src

Docker Usage

docker build -t slashdot-scraper .
docker run -eAPIFY_TOKEN=your_token slashdot-scraper

Notes

  • The scraper respects rate limits and implements delays between requests
  • HTML content is saved for debugging purposes during development
  • The scraper handles various article listing layouts and structures
  • All URLs are properly resolved and normalized
  • Comment extraction includes author information and engagement metrics
  • The scraper can handle both article listings and detailed article pages

You might also like

Slashdot Scraper

jungle_synthesizer/slashdot-scraper

Extract technology news stories from Slashdot.org including article titles, authors, publication dates, source links, comment counts, and department tags. Browse all sections or scrape the main feed.

๐Ÿ‘ User avatar

BowTiedRaccoon

2

Slashdot Scraper | Tech News and Comments

parseforge/slashdot-scraper

Scrape tech stories from Slashdot including titles, summaries, departments, authors, comment counts, tags and timestamps. Build datasets of technology news and community discussion for trend analysis, content curation and media research at scale across the full catalog

Advanced News Scraper

dorcy/advanced-news-scraper

Extract the latest news articles with custom search queries, providing all the information, including article titles, sources, publication dates, full article text, and an AI-generated summary.

250

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

AP News Scraper

piotrv1001/ap-news-scraper

Scrape news articles from AP News hub and topic pages. Extract enriched article data including full body text, authors with bios, tags, sections, publish dates, and video metadata. Ideal for news monitoring, media research, and content analysis.

Google News Scraper

santamaria-automations/google-news-scraper

Scrape Google News search results. Extract article titles, sources, snippets, links, and publication dates for any keyword. Multi-query support.

Webpage Text Extractor

automation-lab/webpage-text-extractor

This actor fetches web pages and extracts their clean text content by stripping all HTML tags, scripts, and styles. It identifies the main content area (article, main, etc.), extracts headings structure, page links, and metadata like author, publish date, and language. Use it for LLM input...

๐Ÿ‘ User avatar

Stas Persiianenko

74

Fox News Scraper

harvest/fox-news-scraper

Extracts the latest news articles from Fox News, categorized by different feeds (e.g., Latest, World News, Politics, Technology, etc.). The scraper returns structured data including article titles, links, publication dates, and content.

12