VOOZH about

URL: https://apify.com/katerinahronik/junior-guru-job-scraper-demo

โ‡ฑ Junior Guru Job Scraper Demo ยท Apify


Pricing

Pay per usage

Go to Apify Store

Junior Guru Job Scraper Demo

Demo Actor scraper for junior.guru talk.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Kateล™ina Hronรญkovรก

Kateล™ina Hronรญkovรก

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Categories

Share

StartupJobs.cz demo scraper

An Apify Actor that collects developers job listings from StartupJobs.cz using their public API.

Built as a live demo for the junior.guru community talk "Web scraping: Nechte internet pracovat za vรกs".


What does it do?

You give it a keyword (e.g. junior, python, javascript) and it returns a list of matching developer/engineer job offers including title, company, location, salary, and a direct link. Non-tech roles (sales, marketing, etc.) are filtered out automatically.

Results are stored in an Apify Dataset and can be exported to CSV, JSON, or other formats in one click.


Prerequisites

npminstall-g apify-cli
apify login

Step 1 โ€” Find the API using DevTools

Before writing any code, open startupjobs.cz/nabidky in your browser and explore how it loads data.

  1. Press F12 to open DevTools
  2. Go to the Network tab
  3. Filter by Fetch/XHR
  4. Reload the page or type a keyword in the search box
  5. Look for a request to /api/offers

You'll see something like:

GEThttps://www.startupjobs.cz/api/offers?keyword=junior&limit=20&page=1

Open it in a new tab โ€” you get clean JSON back. No HTML parsing needed. ๐ŸŽ‰

{
"resultSet":[
{
"name":"Junior TypeScript Developer",
"company":"Acme s.r.o.",
"url":"/nabidka/12345/junior-typescript-developer",
"locations":"Praha",
"isRemote":true,
"seniorities":["junior"],
"areaSlugs":["back-end-vyvojar","vyvoj"],
"salary":{"min":40000,"max":60000,"currency":"CZK","measure":"monthly"}
}
]
}

Step 2 โ€” Walk through the code

The entire actor is in src/main.ts. Here's what it does:

await Actor.init();
const{ keyword ='', seniority ='', maxResults =50}=await Actor.getInput()??{};
while(collected < maxResults){
// 1. Call the StartupJobs API โ€” plain fetch(), JSON response
const response =awaitfetch(`${API_URL}?keyword=${keyword}&page=${page}`);
const{ resultSet: offers }=await response.json();
for(const offer of offers){
// 2. Skip non-developer roles (sales, marketing, etc.) and wrong seniority
const isDevRole = offer.areaSlugs.some((slug)=>DEV_AREA_SLUGS.has(slug));
const isSeniorityMatch =!seniority || offer.seniorities.includes(seniority);
if(!isDevRole ||!isSeniorityMatch)continue;
// 3. Pick the fields we care about and save to Apify Dataset
await Actor.pushData({
title: offer.name,
company: offer.company,
url:`${BASE_URL}${offer.url}`,
// ...
});
}
}

Three concepts, that's it: fetch โ†’ filter โ†’ save.

StartupJobs has a clean API, so we get JSON directly. If it didn't, we'd have to fetch the HTML page and extract data from it using CSS selectors โ€” this is called parsing:

// Without an API you'd do something like this instead:
import*as cheerio from'cheerio';
const response =awaitfetch('https://www.startupjobs.cz/nabidky?q=javascript');
const html =await response.text();// raw HTML string, not JSON
const $ = cheerio.load(html);// parse the HTML
$('.offer-title').each((_, el)=>{// find all elements matching a CSS selector
const title =$(el).text().trim();// extract the text content
const url =$(el).attr('href');// or an attribute
console.log(title, url);
});

HTML structure changes whenever the site redesigns โ€” APIs are much more stable.


Step 3 โ€” Run locally

# Install dependencies
npminstall
# Run without building (great for development)
npm run dev
# Or build first, then run
npm run build
npm start

To set a custom keyword, create storage/key_value_stores/default/INPUT.json:

{
"keyword":"javascript",
"seniority":"junior",
"maxResults":20
}

Step 4 โ€” Deploy to Apify

$apify push

Your actor is now live at console.apify.com under My Actors.


Step 5 โ€” Schedule & export

Run on a schedule โ€” e.g. every morning at 8:00:

  1. Open your actor in Apify Console
  2. Go to Schedules โ†’ + New Schedule
  3. Set cron: 0 8 * * 1-5 (Monโ€“Fri at 8:00)

Export results:

  • Dataset โ†’ Export โ†’ CSV / JSON
  • Or connect directly to Gmail via Apify integrations

Build your own scraper

Want to scrape a different site? You can use this repo as a starting point.

  1. Pick your starting point based on what the target site looks like:

    SituationTemplate
    Site has a JSON API (like this demo)Clone this repo
    No API, static HTMLts-crawlee-cheerio
    No API, heavy JavaScript / dynamic contentts-crawlee-playwright
    $apify create my-scraper --template ts-crawlee-cheerio
  2. Find the data source โ€” open the target site in your browser, go to DevTools โ†’ Network โ†’ Fetch/XHR, and look for an API call returning JSON. If there's no API, switch to the Elements tab and find the CSS selectors for the data you need.

  3. Edit src/main.ts โ€” replace the fetch() URL and the fields inside Actor.pushData({...}) with whatever your target API or page returns. The structure stays the same: fetch โ†’ filter โ†’ save.

  4. Update .actor/input_schema.json to define the inputs your scraper needs (keywords, URLs, limits, etc.).

  5. Run locally with npm run dev, then deploy with apify push.

The Apify documentation and Academy are great next steps from here.


Going further

WhatHow
Compare day-over-dayStore results with a timestamp, diff on next run
Scrape a JS-heavy siteSwitch to PlaywrightCrawler from Crawlee
Browse 29 000+ ready-made scrapersapify.com/store

Glossary

Web scraping โ€” Automatically collecting data from websites by sending requests and extracting the relevant parts from the response (HTML or JSON).

Server โ€” A computer (or program) that listens for requests over the internet and sends back a response. When you open a website, your browser sends a request to a server, which replies with the page content.

API (Application Programming Interface) โ€” A formal agreement between two programs on how to exchange data: what you can ask for, how to ask it, and what format the answer comes back in. This scraper uses StartupJobs' public API, which means we get clean JSON instead of having to dig through HTML.

Parsing โ€” Analyzing and processing structured text (HTML or JSON) to pull out specific pieces of data. When a site has no API, you parse the raw HTML to find what you need.

JS site (JavaScript-rendered site) โ€” A site that builds its content in the browser using JavaScript. A plain HTTP request returns only an empty shell โ€” the actual data isn't in the source HTML at all. You need a headless browser to load these properly.

Headless browser โ€” A web browser that runs without a visible window. It works exactly like a normal browser (loads pages, runs JavaScript, processes CSS), but everything happens in memory in the background. Used to scrape JS-rendered sites.

LLM (Large Language Model) โ€” A type of AI trained on massive amounts of text, capable of understanding and generating human-like language. In scraping, LLMs can help extract or structure data from unstructured text that would be hard to parse with code alone.

Proxy โ€” An intermediary server between you and the target website. Your requests go through it, so the website sees the proxy's IP address instead of yours. Used to avoid IP bans when scraping at scale.


Resources

You might also like

Guru Jobs Freelancer Scraper

getdataforme/Guru-Jobs-Scraper

Scrape the freelancers profile of Guru Jobs

Guru Freelancer Scraper

piotrv1001/guru-freelancer-scraper

The Guru Freelancer Scraper extracts freelancer listings and enriched profiles from Guru.com directories, capturing names, skills, hourly rates, earnings, feedback scores, membership levels, and location data โ€” ideal for talent sourcing, competitive analysis, and freelance market research.

Guru.com Scraper

shahidirfan/guru-com-scraper

Unlock Guru.com data instantly! Scrape detailed user profiles and job listings with ease. Perfect for recruitment, lead generation, and market analysis. Get essential data like freelancer skills, rates, and active projects to automate your workflow efficiently.

29

5.0

Restaurant Guru Scraper

rainminer/restaurantguru-scraper

Extract restaurants from Restaurant Guru city listings and profile pages โ€” ratings, cuisines, price range, addresses, opening hours, and optional customer reviews. Paste a city or restaurant URL, export JSON/CSV, schedule runs, and integrate via the Apify API.

Guru.com Scraper | Freelance Jobs and Profiles

parseforge/guru-com-scraper

Scrape freelance jobs and freelancer profiles from Guru.com with title, budget, skills, location, ratings, reviews, project descriptions, and apply links. Source talent, monitor gig pricing, generate leads, and build freelance market intelligence for staffing and recruiting.

.guru OpenAPI Directory Scraper

parseforge/apis-guru-openapi-directory-scraper

Tap into records from Guru Openapi Directory with name, version, description, maintainers, repository link, stars and when published. Loved by developer tooling intelligence, dependency monitoring and ecosystem research. Run on demand or on a recurring schedule and feed every row into your favour.