VOOZH about

URL: https://apify.com/evohaus/sahibinden-scraper-puppeteer-js

⇱ sahibinden-scraper-puppeteer-js Β· Apify


πŸ‘ sahibinden-scraper-puppeteer-js avatar

sahibinden-scraper-puppeteer-js

Under maintenance

Pricing

Pay per usage

Go to Apify Store

sahibinden-scraper-puppeteer-js

Under maintenance

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Nail Yakupoglu

Nail Yakupoglu

Maintained by Community

Actor stats

8

Bookmarked

416

Total users

2

Monthly active users

a year ago

Last modified

Share

Sahibinden.com Web Scraper Documentation

Overview

This document provides comprehensive documentation for the Sahibinden.com web scraper built with Apify. The scraper is designed to extract car listings from Sahibinden.com while bypassing anti-bot measures, and store the data in a structured format in BaseRow to power an AI chatbot for used car price estimation.

Architecture

The solution consists of the following components:

  1. Apify Actor - A cloud-based web scraper that handles the extraction of car listings from Sahibinden.com
  2. Playwright with Stealth - Browser automation with anti-detection capabilities to bypass Cloudflare protection
  3. Residential Proxies - IP rotation to avoid blocking and simulate real user traffic
  4. BaseRow Integration - Data storage in a structured format for easy querying by the AI chatbot
  5. Scheduler - Automated regular scraping to keep data fresh

Technical Implementation

1. Anti-Bot Bypassing Techniques

The scraper implements several techniques to bypass Sahibinden.com's Cloudflare protection:

  • Residential Proxies: Uses Apify's RESIDENTIAL proxy group with Turkey country code to appear as legitimate Turkish users
  • Browser Fingerprinting: Implements realistic browser fingerprinting to avoid detection
  • Stealth Plugin: Uses playwright-stealth to modify browser behavior and evade detection
  • Human-like Behavior: Adds random delays, realistic mouse movements, and proper request headers
  • Non-headless Mode: Runs browser in non-headless mode for better Cloudflare bypass
  • Cookie Management: Sets and maintains cookies to simulate returning users
  • User-Agent Rotation: Rotates between realistic user agents for each request

2. Data Extraction

The scraper extracts comprehensive data from car listings including:

  • Basic information (ID, URL, title, price, location)
  • Vehicle specifications (make, model, year, fuel type, etc.)
  • Detailed attributes (interior/exterior features, safety features, etc.)
  • Technical specifications
  • Images
  • Seller information
  • Description and condition details

The data extraction is implemented in two main functions:

  • handleCategoryPage() - Extracts listing URLs from category pages and handles pagination
  • handleDetailPage() - Extracts comprehensive data from individual car listing pages

3. BaseRow Integration

The BaseRow integration provides:

  • Automatic storage of scraped data in a structured format
  • Duplicate detection and handling
  • Field mapping between scraped data and BaseRow table structure
  • Batch processing for efficient data storage

4. Automation

The scraper is configured for automatic scheduling with:

  • Configurable run frequency (hourly, daily, weekly)
  • Performance monitoring
  • Error handling and retry mechanisms
  • Rate limiting to avoid overloading the target site

Setup Instructions

Prerequisites

  • Apify account with access to residential proxies
  • BaseRow account with a table set up for car listings
  • Node.js 16+ for local development (optional)

Apify Actor Setup

  1. Create a new Actor in your Apify account
  2. Upload the code from this repository to the Actor
  3. Configure Actor settings:
    • Memory: Minimum 4 GB recommended
    • Timeout: At least 30 minutes
    • Environment variables: None required

BaseRow Setup

  1. Create a new table in BaseRow with the following structure:
Field NameTypeDescription
listing_idTextUnique identifier from Sahibinden.com
urlURLFull URL of the listing
titleTextTitle of the car listing
priceNumberNumeric price value
price_currencyTextCurrency of the price (TL, EUR)
locationTextLocation information
descriptionLong textFull description text
makeTextCar make/brand
modelTextCar model
seriesTextCar series
yearTextManufacturing year
fuel_typeTextType of fuel
transmissionTextTransmission type
mileageTextKilometer reading
body_typeTextBody type
engine_powerTextEngine power
engine_capacityTextEngine capacity
drive_typeTextDrive type
doorsTextNumber of doors
colorTextCar color
warrantyTextWarranty information
damage_recordTextDamage record status
plate_nationalityTextPlate/nationality information
seller_typeTextSeller type (dealer, individual)
trade_inTextTrade-in availability
conditionTextCar condition
imagesLong textJSON string of image URLs
attributesLong textJSON string of attributes
technical_specsLong textJSON string of technical specs
scraped_atDate & TimeWhen the data was scraped
last_updatedDate & TimeWhen the record was last updated
  1. Get your BaseRow API credentials:
    • API Token
    • Table ID
    • Database ID

Running the Scraper

Configuration Options

The Actor accepts the following input parameters:

{
"startUrls":["https://www.sahibinden.com/kategori/vasita"],
"maxConcurrency":1,
"maxRequestsPerCrawl":1000,
"proxyConfiguration":{
"useApifyProxy":true,
"apifyProxyGroups":["RESIDENTIAL"],
"countryCode":"TR"
},
"baseRowApiToken":"YOUR_BASEROW_API_TOKEN",
"baseRowTableId":"YOUR_BASEROW_TABLE_ID",
"baseRowDatabaseId":"YOUR_BASEROW_DATABASE_ID",
"scheduleInterval":"daily",
"startTime":"02:00"
}

Scheduling

To set up automatic scheduling:

  1. Create a new Schedule in your Apify account
  2. Set the desired frequency (recommended: daily)
  3. Configure the Actor input as shown above
  4. Enable the Schedule

AI Chatbot Integration

The data stored in BaseRow can be used to power an AI chatbot for used car price estimation. The chatbot can:

  1. Take user input describing a car (e.g., "2017 Passat 3 parΓ§a boya 150bin km")
  2. Query the BaseRow table for comparable listings
  3. Calculate an estimated price range based on similar vehicles
  4. Adjust the estimate based on factors like damage status, mileage, and year

Query Examples

To find comparable listings in BaseRow, use queries like:

SELECT*FROM cars
WHERE make ='Volkswagen'
AND model ='Passat'
ANDyear='2017'
AND mileage BETWEEN130000AND170000

Troubleshooting

Common Issues

  1. 403 Forbidden Errors

    • Check that residential proxies are properly configured
    • Verify that the stealth plugin is working correctly
    • Try reducing concurrency to 1
    • Increase delays between requests
  2. Data Extraction Issues

    • Check if Sahibinden.com has changed their page structure
    • Update the selectors in the data extraction functions
    • Verify that the page is fully loaded before extraction
  3. BaseRow Integration Issues

    • Verify API credentials are correct
    • Check that the table structure matches the expected fields
    • Look for API rate limiting issues

Maintenance

To keep the scraper running smoothly:

  1. Regular Monitoring: Check the Actor runs for any errors or performance issues
  2. Updates: Update the code if Sahibinden.com changes their website structure
  3. Proxy Management: Ensure you have sufficient residential proxy credits
  4. Data Cleaning: Periodically clean old or irrelevant listings from BaseRow

Limitations

  • The scraper is designed specifically for Sahibinden.com and may not work on other websites
  • Cloudflare protection methods may change, requiring updates to the bypassing techniques
  • Very high volume scraping may still trigger rate limiting despite all precautions
  • Some listings may have incomplete data if the seller didn't provide all information

You might also like

Sahibinden Search Scraper Pro | Extracts Phone Numbers

clearpath/sahibinden-scraper-pro

Extract phone numbers, prices, locations, photos, and seller details from Sahibinden listing, category, and search URLs. Get enriched classifieds data from cars, real estate, shopping, and more.

118

5.0

Sahibinden Car Scraper

lightkong/sahibinden-car-scraper

The most advanced and stealthy scraper to extract car listings from sahibinden.com. Bypasses strict Cloudflare protections and mandatory login barriers. Extracts Make, Model, Year, KM, Price, and more to JSON.

Sahibinden Phone Scraper - Seller Numbers & Profiles

clearpath/sahibinden-phone-scraper

Extract seller phone numbers and profile data from sahibinden.com listings. Paste URLs, get mobile numbers in international and local Turkish format. Includes seller name, trust score, verification status, and transaction history. No account required.

Sahibinden Real Estate Scraper

lightkong/sahibinden-real-estate-scraper

Scrape real estate listings from Sahibinden.com easily. Extracts prices, locations, mΒ², rooms, dates & photos. Built with advanced Cloudflare bypass and session cookie support to evade login walls. Supports pagination and detail pages.

174

1.0

Sahibinden Real Estate Scraper | Phones & Contacts (2026)

clearpath/sahibinden-real-estate

Extract real estate listings from sahibinden.com with prices, rooms, sizes, GPS coordinates, seller details, and phone numbers. Filter by 81 cities, 4 property categories, room count, building age, price range, and more. Sale, rent, and daily rental listings with 45+ fields per result.

96

5.0

Arabam.com Scraper | Fast & Reliable

fatihtahta/arabam-com-scraper

Scrape live vehicle listings from Arabam.com including prices, specs, sellers and more.. Ideal for market research, price tracking, or inventory monitoring. Fast, structured, reliable.

Arabam Cars Search Scraper

stealth_mode/arabam-cars-search-scraper

Scrape comprehensive used car listings from Arabam.com, Turkey's largest automotive marketplace. Extract detailed vehicle specifications, pricing, seller information, and market trends. Ideal for price analysis, inventory management, and competitive intelligence in the Turkish automotive sector.

88

Example Puppeteer

apify/example-puppeteer

Example showing how to use headless Chromium with Puppeteer to open a web page, determine its dimensions, save a screenshot, and print the page to PDF. This actor must use images with Puppeteer (Node.js 8 + Puppeteer on Debian).

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Related articles

How to take screenshots and generate PDFs with Puppeteer
Read more