👁 Website Job Extractor (Browser) avatar

Website Job Extractor (Browser)

Pricing

from $8.00 / 1,000 job extracted (browser)s

Website Job Extractor (Browser)

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Job Extractor. Use it for the ~28% of company sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Pricing

from $8.00 / 1,000 job extracted (browser)s

Rating

0.0

(0)

Developer

👁 Ale

Ale

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

When to use this actor

Career pages built with React, Vue, Angular, or other JS frameworks
Pages that return empty/skeleton HTML without JavaScript execution
Companies flagged by the HTTP actor's JS-rendering detection
Auto-chained via enablePlaywrightFallback on the HTTP actor

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client — Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/website-job-extractor-browser

Example prompt once connected:

"Use website-job-extractor-browser to process data with website job extractor browser. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

How it works

Playwright renders the full page (waits for network idle + text content)
Career page discovery from homepage navigation (same as HTTP actor)
ATS detection for 19 systems (Personio, Greenhouse, Softgarden, etc.)
LLM extraction using Gemini Flash / Groq / OpenRouter
Validation with confidence scoring and deduplication
Pagination follow-up for multi-page listings

Same extraction pipeline as the HTTP actor — same output format, same quality.

Input

Same input format as the HTTP actor. Typically auto-chained:

{
"companies":[
{
"company_id":"abc-123",
"company_name":"TechCorp AG",
"website_url":"https://techcorp.ch"
}
],
"llmProvider":"gemini",
"geminiApiKey":"YOUR_KEY"
}

Output

Each job is a dataset item with browser_extraction: true:

{
"company_id":"abc-123",
"company_name":"TechCorp AG",
"title":"Senior Frontend Developer (m/w/d)",
"location":"Zürich",
"employment_type":"Vollzeit",
"department":"Engineering",
"application_url":"https://techcorp.ch/jobs/apply/123",
"confidence":0.85,
"browser_extraction":true,
"extracted_at":"2026-03-09T10:00:00.000Z"
}

Memory requirements

Minimum: 1024 MB (Playwright + Chrome)
Recommended: 2048 MB for 5+ companies
Maximum: 4096 MB

Pricing

Browser-based extraction costs ~2x the HTTP actor due to Chrome overhead:

Event	Cost
`browser-company-enriched`	$0.02/company
`browser-job-result`	$0.008/job

Auto-chaining

The HTTP actor can automatically trigger this browser actor for JS-flagged companies:

Run the HTTP actor with enablePlaywrightFallback: true
Companies with js_rendering_suspected are collected
A browser actor run starts automatically (fire-and-forget)
The browser run ID is saved in the key-value store as BROWSER_FALLBACK_RUN_ID

LLM fallback chain

Like the HTTP actor, this actor supports automatic provider fallback. Just provide API keys for the providers you want to use:

{
"geminiApiKey":"YOUR_GEMINI_KEY",
"llmApiKey":"YOUR_GROQ_KEY",
"openrouterApiKey":"YOUR_OPENROUTER_KEY"
}

The system auto-discovers available providers and builds a fallback chain (e.g. Gemini → Groq → OpenRouter). If one provider's quota runs out, it instantly falls back to the next.

End-to-end pipeline

This actor is part of a 5-actor enrichment suite:

Actor	Purpose	Memory	Link
Google Maps Scraper	Find companies by location	~80MB	View
Website Job Extractor	Extract jobs (HTTP)	~128MB	View
Website Job Extractor (Browser)	Extract jobs from JS pages	~1-4GB	This actor
Website Contact Extractor	Extract contacts (HTTP)	~256MB	View
Website Contact Extractor (Browser)	Extract contacts from JS pages	~1-4GB	View

Limitations

Higher memory usage (~1GB vs ~128MB for HTTP)
Slower execution (page rendering + wait times)
Higher cost per result (2x HTTP rates)
Use the HTTP actor first — only fall back to browser when needed

👁 Website Contact Extractor (Browser) avatar

Website Contact Extractor (Browser)

santamaria-automations/website-contact-extractor-browser

Extract team contacts from JavaScript-rendered company websites (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Contact Extractor. Handles the ~28% of sites that need a real browser. Same output format, same quality, same LLM fallback chain.

👁 User avatar

Ale

👁 Quick Website Content Scraper ( Extract Text for RAG & LLMs ) avatar

Quick Website Content Scraper ( Extract Text for RAG & LLMs )

automateitplease/ai-web-content-scraper-extract-text-for-rag-llms

Extract clean text from any website for AI/LLM applications. Supports both static and JavaScript-rendered sites (React, Vue, Angular). Perfect for RAG systems, chatbot training, and content analysis.

👁 User avatar

AutomateItPlease Workflow And Automaton Ops

Website Contact Scraper Pro MCP

red.cars/website-contact-scraper-pro-mcp

Contact data extraction from JavaScript SPAs — emails, phones, social links via headless browser. Handles React, Angular, Vue, Next.js that return empty for HTTP scrapers.

👁 User avatar

AutomateLab

👁 Actor Benchmark avatar

Actor Benchmark

apify/actor-benchmark

Compares various builds of the same actor to measure how they perform on the same input

👁 User avatar

Apify

👁 Website Contact Scraper avatar

Website Contact Scraper

seemuapps/website-contact-scraper

Extract emails, phone numbers, and social media links from any website. Scrape multiple sites per run using a real browser for full JS-rendered content.

👁 User avatar

Andrew

Website extract

mrahil/my-actor

It is website extractor

👁 User avatar

Mohammed Rahil

147

👁 💼 Remote Job Board Scraper avatar

💼 Remote Job Board Scraper

pixel_drafter/remote-job-board-scraper

Remote Job Board Scraper extracts remote job listings from public job boards using a headless browser. It collects job titles, company names, locations, and job URLs in structured JSON format. Ideal for job aggregators, alerts, analytics, and market research workflows.

👁 User avatar

Rohit Bhagat

Website Email Extractor

alex_claw/website-email-extractor

👁 User avatar

Alex Claw

👁 TheMuse Job Scraper – Cheap 🎯🔍💼 avatar

TheMuse Job Scraper – Cheap 🎯🔍💼

scrapestorm/themuse-job-scraper---cheap

🔍 Easily collect job listings from TheMuse.com Extract job and company data from one of the most trusted career platforms, including job titles, company names, job levels, employment types, company profiles, job URLs, and more Ideal for job market research and recruitment intelligence 🌍💼

👁 User avatar

Storm_Scraper

👁 Job-nexus avatar

Job-nexus

scenic_bookmark/job-nexus

This Actor scrapes job listings from public job boards and enriches them into structured, analysis-ready data. It is designed for recruiters, job market analysts, startups, and AI/LLM pipelines that need reliable job data without manual effort.

👁 User avatar

sujan shetty

5.0

👁 Blog article image

AI web scraping and automation with Python

👁 Blog article image

Scraping job listings data for a competitive edge

👁 Blog article image

How to scrape dynamic websites with Python

URL: https://apify.com/santamaria-automations/website-job-extractor-browser