VOOZH about

URL: https://apify.com/yanmiayn/consensus-web-classifier

โ‡ฑ Website Categorization API โ€” 6-LLM URL Classifier (Bulk) ยท Apify


๐Ÿ‘ Website Categorization API โ€” 6-LLM Consensus URL Classifier avatar

Website Categorization API โ€” 6-LLM Consensus URL Classifier

Pricing

from $7.00 / 1,000 results

Go to Apify Store

Website Categorization API โ€” 6-LLM Consensus URL Classifier

Stop hallucinated category labels. Run URLs through 6 LLMs voting in parallel (DeepSeek-v4, Llama-4, Qwen-3.5, Nemotron-3, GLM-5.1, MiniMax) for higher-confidence taxonomy classification. Lead-gen filtering, content moderation, dataset labeling. $0.007 per URL.

Pricing

from $7.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ yanmiayn

yanmiayn

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

2 months ago

Last modified

Share

Multi-Model Consensus Web Page Classifier

Classify any list of URLs into your custom taxonomy using a 6-model consensus engine (open-weights frontier LLMs voting in parallel). Reduces single-model hallucination on edge cases โ€” useful for lead-gen filtering, content moderation queues, knowledge-graph ingestion, and dataset labeling.

Why consensus?

A single LLM occasionally hallucinates labels on ambiguous pages. This actor fans out the same classification prompt to 6 independent open-weights models and returns the consensus label plus a confidence signal. When the models agree, you can trust the label; when they disagree, the row is flagged for review.

Models in the pool: DeepSeek-v4, Llama-4-maverick, Qwen-3.5, NVIDIA Nemotron-3, GLM-5.1, MiniMax-m2.7.

Pricing

Pay-per-event (no subscription):

  • $0.007 per URL classified (charged on each result row written)
  • $0.01 per run (one-time orchestration fee)

A 1,000-URL run costs ~$7.01.

Input

{
"urls":["https://stripe.com","https://nytimes.com"],
"taxonomy":["fintech","news_media","developer_tools","ecommerce","other"],
"consensusMode":"majority",
"maxConcurrency":5
}
FieldTypeDefaultDescription
urlsstring[]โ€”Public URLs to classify.
taxonomystring[]โ€”2โ€“30 candidate categories. Should be mutually exclusive and include an "other" bucket.
consensusMode"majority" | "deep"majoritymajority uses fewer models (faster). deep uses the full pool.
maxConcurrencyint5Parallel URL fetches (1โ€“20).

Output (per URL)

{
"url":"https://stripe.com",
"title":"Stripe | Financial Infrastructure for the Internet",
"status":"ok",
"category":"fintech",
"confidence":null,
"consensusMode":"majority",
"durationMs":1840
}

The category field returns the consensus answer when models agree, or "other" as a safe fallback when JSON parsing fails. confidence may be null while the post-processing extractor is being improved.

For URLs that take too long to fetch or where the consensus engine times out, status: "error" is returned with a reason โ€” those rows are not charged.

Use cases

  • Lead-gen filtering โ€” bucket scraped homepages by industry before SDR outreach.
  • Content moderation triage โ€” pre-tag URLs in user-submitted feeds.
  • Dataset labeling โ€” bootstrap a training set with consensus labels.
  • Affiliate / partner discovery โ€” group competitor sites by vertical.
  • Compliance pre-screening โ€” surface pages that may belong to regulated categories.

Tips

  • Treat the actor as a first-pass classifier: high-confidence rows go straight through, ambiguous or error rows go to a human queue.
  • Categories work better when they are concrete and non-overlapping. Add "other" as the safety bucket.
  • Heavy single-page-application URLs may exceed the 120s consensus timeout; expect a small percentage of error rows on JS-heavy targets.

How it works

  1. Fetches each URL (10s budget, follows redirects).
  2. Extracts title + meta description + ~250 characters of body text.
  3. Sends a compact classification prompt to the public consensus endpoint (/v1/public), which fans out to the 6-model pool and returns the agreed JSON label.
  4. Parses the result and pushes one row per URL to the Apify dataset.

No personal data is stored โ€” only the public page text and your taxonomy are sent for classification. The consensus engine is rate-limited at 10 requests per IP per day on the free public endpoint.

Limitations (honest)

  • The actor is a fresh listing (May 2026). Accuracy claims have not been independently benchmarked yet โ€” early users help us calibrate.
  • A small fraction of buyer test runs hit the public endpoint's per-IP rate limit on bursts. Use small batches (โ‰ค30 URLs/run) for now, or contact the publisher for a private endpoint key.
  • confidence extraction is being tightened; for now null is common.

Source

Built and maintained by yanmiayn. Bug reports and feature requests via the actor's Issues tab on Apify.

You might also like

UUID Generator

apizy/uuid-generator

Generate UUID v1, v3, v4, v5 instantly. Perfect for test data, unique IDs, database seeding, and development workflows. Choose random v4 (most common), time-based v1, or deterministic v3/v5. Customize hyphen format. Export results via Dataset or API. Fast, no-code tool with scheduling and monitoring

Deep Research Agent (Brave + Gemini 3.1/GPT-5.1/Opus4.6)

visita/deep-research-agent

๐Ÿฆ Autonomous research assistant. Uses Brave Search + AI (Gemini 3.1/GPT-5.1/Opus4.6) to search, scrape, and synthesize the web into professional, fully cited reports. ๐Ÿ“„ Features instant HTML/Markdown export and massive context windows. Perfect for market intelligence, academic research, & briefs.

๐Ÿ‘ User avatar

Visita Intelligence

14

UUID Generator

rl1987/uuid-generator

Generate bulk universally unique identifiers (UUID v1, v3, v4, v5, v7) on demand. Export as JSON, CSV, Excel or plain text.

Patreon Extractor ๐ŸŽฏ โญ5.0

jupri/patreon

๐Ÿ’ซ All-in-One Patreon.com Scraper [v5.0]

UUID Generator

maximedupre/uuid-generator

Generate UUID v1, v3, v4, v5, v7, alphanumeric IDs, and sequential IDs in bulk. Validate, analyze, convert, deduplicate, and summarize UUIDs, then export clean results from Apify.

๐Ÿ‘ User avatar

Maxime Duprรฉ

2

YouTube Lead Qualifier Pro

badruddeen/youtube-lead-qualifier-pro

Instantly turn any YouTube niche into 5โ€“30 qualified B2B leads with real business emails. AI-scores every channel 0โ€“100 using Groq Llama 3.3 70B โ†’ delivers only the hottest ones in a ready-to-send CSV.

๐Ÿ‘ User avatar

Badruddeen Naseem

3

Social Content Generator (TikTok, LinkedIn YouTube, Blog)

visita/social-content-generator

Turn global chaos into strategic content. Generate viral TikTok scripts, SEO blog outlines, and LinkedIn thought leadership from real-time intelligence using premium AI models via OpenRouter (GPT-5.1, Claude 4.6, Gemini 3.1).

๐Ÿ‘ User avatar

Visita Intelligence

7

GUID Forge: Bulk UUID & ID Generator

thescrapelab/guid-forge-bulk-uuid-id-generator

Generate UUID v1/v4/v5 and custom IDs (alphanumeric or sequential, timestamped) at high speed. Outputs to dataset

3

5.0

Related articles

5 Apify MCP use cases you can try now
Read more
How to automate sentiment analysis (plus the best sentiment analysis tools)
Read more