VOOZH about

URL: https://apify.com/santamaria-automations/website-job-extractor-browser

⇱ Website Job Extractor (Browser) Β· Apify


Pricing

from $8.00 / 1,000 job extracted (browser)s

Go to Apify Store

Website Job Extractor (Browser)

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Job Extractor. Use it for the ~28% of company sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Pricing

from $8.00 / 1,000 job extracted (browser)s

Rating

0.0

(0)

Developer

πŸ‘ Ale

Ale

Maintained by Community

Actor stats

0

Bookmarked

21

Total users

2

Monthly active users

2 months ago

Last modified

Share

Extract job listings from JavaScript-rendered career pages (React, Vue, Angular SPAs) using AI + Playwright.

This is the browser-based companion to the Website Job Extractor (HTTP-only). Use this actor when the HTTP version flags companies with js_rendering_suspected: true.

When to use this actor

  • Career pages built with React, Vue, Angular, or other JS frameworks
  • Pages that return empty/skeleton HTML without JavaScript execution
  • Companies flagged by the HTTP actor's JS-rendering detection
  • Auto-chained via enablePlaywrightFallback on the HTTP actor

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client β€” Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/website-job-extractor-browser

Example prompt once connected:

"Use website-job-extractor-browser to process data with website job extractor browser. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

How it works

  1. Playwright renders the full page (waits for network idle + text content)
  2. Career page discovery from homepage navigation (same as HTTP actor)
  3. ATS detection for 19 systems (Personio, Greenhouse, Softgarden, etc.)
  4. LLM extraction using Gemini Flash / Groq / OpenRouter
  5. Validation with confidence scoring and deduplication
  6. Pagination follow-up for multi-page listings

Same extraction pipeline as the HTTP actor β€” same output format, same quality.

Input

Same input format as the HTTP actor. Typically auto-chained:

{
"companies":[
{
"company_id":"abc-123",
"company_name":"TechCorp AG",
"website_url":"https://techcorp.ch"
}
],
"llmProvider":"gemini",
"geminiApiKey":"YOUR_KEY"
}

Output

Each job is a dataset item with browser_extraction: true:

{
"company_id":"abc-123",
"company_name":"TechCorp AG",
"title":"Senior Frontend Developer (m/w/d)",
"location":"ZΓΌrich",
"employment_type":"Vollzeit",
"department":"Engineering",
"application_url":"https://techcorp.ch/jobs/apply/123",
"confidence":0.85,
"browser_extraction":true,
"extracted_at":"2026-03-09T10:00:00.000Z"
}

Memory requirements

  • Minimum: 1024 MB (Playwright + Chrome)
  • Recommended: 2048 MB for 5+ companies
  • Maximum: 4096 MB

Pricing

Browser-based extraction costs ~2x the HTTP actor due to Chrome overhead:

EventCost
browser-company-enriched$0.02/company
browser-job-result$0.008/job

Auto-chaining

The HTTP actor can automatically trigger this browser actor for JS-flagged companies:

  1. Run the HTTP actor with enablePlaywrightFallback: true
  2. Companies with js_rendering_suspected are collected
  3. A browser actor run starts automatically (fire-and-forget)
  4. The browser run ID is saved in the key-value store as BROWSER_FALLBACK_RUN_ID

LLM fallback chain

Like the HTTP actor, this actor supports automatic provider fallback. Just provide API keys for the providers you want to use:

{
"geminiApiKey":"YOUR_GEMINI_KEY",
"llmApiKey":"YOUR_GROQ_KEY",
"openrouterApiKey":"YOUR_OPENROUTER_KEY"
}

The system auto-discovers available providers and builds a fallback chain (e.g. Gemini β†’ Groq β†’ OpenRouter). If one provider's quota runs out, it instantly falls back to the next.

End-to-end pipeline

This actor is part of a 5-actor enrichment suite:

ActorPurposeMemoryLink
Google Maps ScraperFind companies by location~80MBView
Website Job ExtractorExtract jobs (HTTP)~128MBView
Website Job Extractor (Browser)Extract jobs from JS pages~1-4GBThis actor
Website Contact ExtractorExtract contacts (HTTP)~256MBView
Website Contact Extractor (Browser)Extract contacts from JS pages~1-4GBView

Limitations

  • Higher memory usage (~1GB vs ~128MB for HTTP)
  • Slower execution (page rendering + wait times)
  • Higher cost per result (2x HTTP rates)
  • Use the HTTP actor first β€” only fall back to browser when needed

You might also like

Website Contact Extractor (Browser)

santamaria-automations/website-contact-extractor-browser

Extract team contacts from JavaScript-rendered company websites (React, Vue, Angular) using AI + Playwright. Companion to the HTTP-only Website Contact Extractor. Handles the ~28% of sites that need a real browser. Same output format, same quality, same LLM fallback chain.

Quick Website Content Scraper ( Extract Text for RAG & LLMs )

automateitplease/ai-web-content-scraper-extract-text-for-rag-llms

Extract clean text from any website for AI/LLM applications. Supports both static and JavaScript-rendered sites (React, Vue, Angular). Perfect for RAG systems, chatbot training, and content analysis.

πŸ‘ User avatar

AutomateItPlease Workflow And Automaton Ops

49

Actor Benchmark

apify/actor-benchmark

Compares various builds of the same actor to measure how they perform on the same input

Website Contact Scraper

seemuapps/website-contact-scraper

Extract emails, phone numbers, and social media links from any website. Scrape multiple sites per run using a real browser for full JS-rendered content.

πŸ’Ό Remote Job Board Scraper

pixel_drafter/remote-job-board-scraper

Remote Job Board Scraper extracts remote job listings from public job boards using a headless browser. It collects job titles, company names, locations, and job URLs in structured JSON format. Ideal for job aggregators, alerts, analytics, and market research workflows.

11

TheMuse Job Scraper – Cheap πŸŽ―πŸ”πŸ’Ό

scrapestorm/themuse-job-scraper---cheap

πŸ” Easily collect job listings from TheMuse.com Extract job and company data from one of the most trusted career platforms, including job titles, company names, job levels, employment types, company profiles, job URLs, and more Ideal for job market research and recruitment intelligence πŸŒπŸ’Ό

2

Job-nexus

scenic_bookmark/job-nexus

This Actor scrapes job listings from public job boards and enriches them into structured, analysis-ready data. It is designed for recruiters, job market analysts, startups, and AI/LLM pipelines that need reliable job data without manual effort.

13

5.0

Related articles

AI web scraping and automation with Python
Read more
Scraping job listings data for a competitive edge
Read more
How to scrape dynamic websites with Python
Read more