VOOZH about

URL: https://apify.com/x_dabit_x/sepe-empleo-es-pro

⇱ SEPE Spain Job Scraper – Ofertas de Empleo ES [DEPRECATED] Β· Apify


πŸ‘ SEPE Spain Job Scraper – Ofertas de Empleo ES avatar

SEPE Spain Job Scraper – Ofertas de Empleo ES

Deprecated

Pricing

from $1.00 / 1,000 results

Go to Apify Store

SEPE Spain Job Scraper – Ofertas de Empleo ES

Deprecated

Scrapes job offers from SEPE Spain (sepe.es) with stealth Camoufox, proxy rotation, skills filtering (regex + TF-IDF ML), deduplication, change-detection alerts, and Prometheus metrics export. Supports province/CCAA filtering with Vizcaya/Bizkaia focus.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ David Cortes

David Cortes

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

2 months ago

Last modified

Share

SEPE Spain Job Scraper – Ofertas de Empleo ES Pro

The #1 Apify Actor for scraping SEPE Spain job offers with full stealth, smart skills filtering, and K8s-ready Prometheus metrics.

  • Anti-bot max: Camoufox (Firefox stealth) + Apify residential proxies + random delays + cookie handling
  • Smart skills filter: regex + scikit-learn TF-IDF cosine similarity (catches "contenedores" β†’ Docker, "orquestaciΓ³n" β†’ Kubernetes)
  • Change-detection alerts: compares every run against the previous one β†’ new / changed / removed offers
  • Deduplication: SHA-256 hash per offer, persisted across runs
  • K8s-ready: Prometheus metrics exported to KV store (scrapeable by any Prometheus server)
  • Province focus: Vizcaya / Bizkaia by default, all 52 Spanish provinces supported

Output Example

{
"url":"https://www.sepe.es/HomeSepe/Personas/encontrar-empleo/...",
"titulo_oferta":"DevOps Engineer – Kubernetes / AWS",
"empresa":"TecnologΓ­a Vasca S.L.",
"provincia":"Vizcaya",
"salario":"35.000 € - 50.000 €/aΓ±o",
"skills_requeridas":["Kubernetes","Docker","AWS","Terraform","Linux","CI/CD"],
"fecha_publicacion":"2026-04-15",
"enlace_aplicar":"https://www.sepe.es/HomeSepe/Personas/encontrar-empleo/.../solicitar"
}

Input Schema

FieldTypeDefaultDescription
start_urlsarraySEPE national pagesExtra entry-point URLs
provinciasarray["Vizcaya","Bizkaia"]Province/CCAA filter (52 provinces supported)
skillsarray["Kubernetes","Docker","Python"]Skills to filter by (regex + ML)
max_pagesint50Max listing pages per entry-point
use_ml_skillsbooltrueEnable TF-IDF ML skill matching
use_proxybooltrueUse Apify residential proxy
proxy_groupsarray["RESIDENTIAL"]Proxy groups
proxy_countrystring"ES"Proxy country (ES = Spanish IP)
headlessbooltrueHeadless browser mode
min_delayfloat2.0Min delay between requests (s)
max_delayfloat5.0Max delay between requests (s)

Quick Start

Run locally

# 1. Clone / enter the project
cd sepe-empleo-es-pro
# 2. Create virtual env and install dependencies
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install-r requirements.txt
# 3. Install Camoufox browser binaries
python -m camoufox fetch
# 4. Put test input in place
cp test_input.json storage/key_value_stores/default/INPUT.json
# 5. Run
python -m my_actor

Run on Apify

# Login (one-time)
apify login
# Push and deploy
apify push
# Run with test input
apify run --input-file test_input.json

Run via API

curl-X POST \
"https://api.apify.com/v2/acts/YOUR_USERNAME~sepe-empleo-es-pro/runs"\
-H"Content-Type: application/json"\
-H"Authorization: Bearer YOUR_TOKEN"\
-d'{
"provincias": ["Vizcaya", "Bizkaia"],
"skills": ["Kubernetes", "Python", "DevOps"],
"max_pages": 50
}'

Architecture

my_actor/
β”œβ”€β”€ main.py # Actor entry point, crawler setup, post-processing
β”œβ”€β”€ routes.py # Crawlee router: NAV / LIST / DETAIL handlers + XHR interception
β”œβ”€β”€ extractors.py # Multi-selector SEPE data extraction with JSON-LD + regex fallbacks
β”œβ”€β”€ skills_matcher.py # Regex + TF-IDF scikit-learn skills detection (30+ tech skills)
β”œβ”€β”€ dedup.py # SHA-256 offer deduplication, cross-run persistence
β”œβ”€β”€ alerts.py # Change-detection: new / changed / removed offers diff
β”œβ”€β”€ metrics.py # Prometheus metrics (counters, gauges, histograms)
└── config.py # Province codes (52), SEPE URLs, CSS selectors, rate-limit settings

Anti-bot Stack

LayerTechnologyConfig
Browser fingerprintCamoufox (Firefox stealth)os=windows/macos, locale=es-ES, geoip=true
IP rotationApify Residential proxiescountryCode=ES (Spanish IPs)
TimingRandom delays2–5 s per request (configurable)
Detection evasionCookie auto-acceptHandles SEPE's cookie banner
CAPTCHA detectionText/title heuristicsAuto-retry on fresh session
Header generationCamoufox built-inRealistic browser headers
ScrollingJS scroll simulationTriggers lazy-loaded content
XHR interceptionPlaywright response hookCatches SEPE's JSON API calls

Skills Matching Pipeline

Input text (job description)
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Regex matcher β”‚ β”‚ TF-IDF cosine similarity β”‚
β”‚ (30+ skills, β”‚ + β”‚ (scikit-learn, threshold β”‚
β”‚ 50+ aliases) β”‚ β”‚ 0.25, ngram 1-2) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
Canonical skill names
["Kubernetes","Docker","Python"]

Prometheus Metrics (K8s-ready)

Metrics are exported in standard Prometheus text format to the Actor's Key-Value Store under the key prometheus_metrics. Retrieve them via:

curl"https://api.apify.com/v2/key-value-stores/STORE_ID/records/prometheus_metrics"\
-H"Authorization: Bearer YOUR_TOKEN"

Available metrics

MetricTypeDescription
sepe_offers_scraped_totalCounterTotal offers stored
sepe_offers_new_totalCounterNew vs previous run
sepe_offers_changed_totalCounterChanged offers
sepe_offers_removed_totalCounterRemoved offers
sepe_requests_total{status}CounterRequests by status (success/failed/retried)
sepe_pages_skipped_duplicates_totalCounterDedup skips
sepe_skills_matched_total{skill}CounterMatches per skill
sepe_offers_in_datasetGaugeCurrent dataset size
sepe_dedup_ratioGaugeDuplicate ratio (0–1)
sepe_pages_crawled_totalGaugePages visited
sepe_proxy_errors_totalGaugeProxy/network errors
sepe_scrape_duration_secondsHistogramTotal run duration
sepe_page_load_duration_secondsHistogramPer-page load time

Kubernetes scraping example

# prometheus-scrape-config.yaml
-job_name: sepe_scraper
metrics_path: /v2/key-value-stores/STORE_ID/records/prometheus_metrics
scheme: https
bearer_token: YOUR_APIFY_TOKEN
static_configs:
-targets:[api.apify.com]

Change Detection Alerts

After each run an alerts_report is saved to the Key-Value Store and also pushed to the dataset as a record with _record_type: "alerts_summary":

{
"_record_type":"alerts_summary",
"generated_at":"2026-04-15T10:30:00Z",
"stats":{
"new_count":47,
"changed_count":12,
"removed_count":3,
"total_current":1024,
"total_previous":980
},
"sample_new_offers":[ ... ],
"changed_offers":[{"offer":{...},"changed_fields":["salario"]}],
"removed_offers":[ ... ]
}

Integrate with Zapier / Make / Slack via the Apify webhook β†’ trigger on run completion.


Province Codes Reference

All 52 Spanish provinces are supported. Examples:

InputProvinceCode
"Vizcaya" or "Bizkaia"Vizcaya / Bizkaia48
"Madrid"Madrid28
"Barcelona"Barcelona08
"GuipΓΊzcoa" or "Gipuzkoa"GuipΓΊzcoa20
"Valencia" or "València"Valencia46
"Sevilla"Sevilla41

Legal & Compliance

  • Only public, freely accessible data is scraped
  • Rate-limited to ≀ 1 request/second (configurable)
  • Respects SEPE's robots.txt structure
  • No login, no personal data, no GDPR-protected content
  • Data is from sepe.es which is a Spanish public institution

Troubleshooting

ProblemSolution
0 offers returnedSEPE may have changed page structure; check Actor logs for CSS selector misses
CAPTCHA detectedEnable Apify Residential proxy (use_proxy: true, proxy_groups: ["RESIDENTIAL"])
Slow runsIncrease max_concurrency in main.py or reduce max_delay
Missing skillsAdd aliases to SKILLS_TAXONOMY in config.py
Stale dedupDelete previous_offers_hashes and previous_offers_snapshot from KV store

Deploy to Apify

apify login
apify push

πŸ‘ Deploy to Apify

You might also like

Jobtoday Jobs Scraper - Cheap πŸ’ΌπŸš€πŸŒ

scrapestorm/jobtoday-jobs-scraper---cheap

πŸ” Scrape Mass / Bulk Jobs – Jobtoday.com πŸ’Ό Enter your Jobtoday search results URL to collect job listings at scale, including job title, company name, location, salary, employment type, posted date & job URL πŸ“πŸ’» Perfect for job market research, recruitment & global labor market intelligence πŸ“Š

3

InfoJobs Job Scraper - Barato Cheap πŸ’ΌπŸ‡ͺπŸ‡ΈπŸ”Ž

scrapestorm/infojobs-job-scraper---barato-cheap

Looking to collect job listings from InfoJobs? πŸ’ΌπŸ‡ͺπŸ‡Έ With this InfoJobs Job Scraper πŸ”Ž gather job data from InfoJobs URLs including job title, company name, location, work mode, salary, contract type, posting date, job URL & more. Perfect for job market analysis and hiring trend monitoring πŸ“Š

3

5.0

(1)

Linkedin Jobs Scraper

minyo/linkedin-jobs-scraper

A powerful and customizable scraper that extracts public LinkedIn job data β€” no login or cookies required. Get clean, structured results with job titles, companies, locations, salaries, and more. Fast, reliable, and proxy-ready (coming soon). Perfect for analysis and automation.

Infojobs Jobs Details Scraper

ecomscrape/infojobs-jobs-details-scraper

Automate job data collection from InfoJobs.net, Spain's largest employment platform with over 3 million registered professionals. Extract detailed job listings including descriptions, requirements, company information, salaries, and candidate matching data for market research.

ecomscrape

5

InfoJobs Job Detail Scraper - Barato Cheap πŸ’ΌπŸ”ŽπŸ‡ͺπŸ‡Έ

scrapestorm/infojobs-job-detail-scraper---barato-cheap

Looking to collect detailed job data from InfoJobs.net? πŸ’ΌπŸ”Ž With this Scraper you can extract complete information directly from InfoJobs job offer pages including job title, company, location, salary, description & more Perfect for recruitment intelligence, job market research & HR analytics πŸ“Š

2

5.0

(1)

Computrabajo Jobs Search Scraper

stealth_mode/computrabajo-jobs-search-scraper

Scrape job search results from Computrabajo β€” one of Latin America's largest job platforms. Extract offer details, company profiles, stats, and required skills from any country-specific subdomain in bulk.

Tecnoempleo Scraper - Spain IT Jobs

blackfalcondata/tecnoempleo-scraper

Scrape tecnoempleo.com - Spain's dedicated IT job board with 2,700+ active IT listings nationwide. Structured salary (min/max/period in EUR), skill taxonomy on every job, and incremental mode with repost detection for daily tracking.

πŸ‘ User avatar

Black Falcon Data

10

ElEspanol.com Scraper

lexis-solutions/elespanol

Scrape news content from El EspaΓ±ol - including headlines, summaries, article bodies, authors, and publish dates. Ideal for news aggregation, market analysis, and trend tracking. Fast, structured, and customizable extraction from Spain’s leading news source.

πŸ‘ User avatar

Lexis Solutions

10

5.0

(1)

Infojobs Jobs Search Scraper

ecomscrape/infojobs-jobs-search-scraper

Automate job data collection from InfoJobs.net, Spain's leading employment portal with over 3 million professionals. Extract comprehensive job listings including salaries, contract types, company details, and location data for market analysis, recruitment automation, and competitive intelligence.

ecomscrape

11

Computrabajo Scraper - Jobs in 19 Latin American Countries

santamaria-automations/computrabajo-scraper

Extract jobs from Computrabajo across 19 LATAM countries (MX, CO, PE, AR, CL and more). 40+ fields: salary min/max/currency/period, full description, hiring organization, contract type, working hours, industry, min education, years of experience, skill tags. Auto-detects country from URL.