VOOZH about

URL: https://apify.com/saadithya/group-trip-data-extractor

⇱ Group Trip Data Extractor Β· Apify


Pricing

from $0.05 / result

Go to Apify Store

Group Trip Data Extractor

Extract structured trip information from multiple group trip URLs and enrich data using AI

Pricing

from $0.05 / result

Rating

0.0

(0)

Developer

πŸ‘ Aadhithya

Aadhithya

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

2 months ago

Last modified

Categories

Share

A production-ready Apify Actor that extracts structured trip information from multiple group trip URLs and enriches data using AI.

Project Structure

AI Group trip extractor/
β”œβ”€β”€ .actor/
β”‚ β”œβ”€β”€ actor.json # Actor configuration
β”‚ β”œβ”€β”€ input_schema.json # Input parameter definitions
β”‚ β”œβ”€β”€ output_schema.json # Output schema forAPI/Console
β”‚ └── dataset_schema.json # Dataset field definitions
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ main.js # Core actor logic
β”‚ β”œβ”€β”€ scraper.js # Web scraping(Cheerio/Playwright)
β”‚ β”œβ”€β”€ ai-enricher.js # OpenAI integration
β”‚ └── schema.js # Schema validation utilities
β”œβ”€β”€ Dockerfile # Docker build configuration
β”œβ”€β”€ package.json # Dependencies & scripts
└── INPUT.json # Sample input for testing

Features

  • Multi-URL Processing: Process multiple trip URLs in a single run
  • Intelligent Scraping: Uses Cheerio (fast) or Playwright (JavaScript-heavy pages)
  • AI Enrichment: Uses OpenAI to infer missing data (coordinates, country, city, trip type)
  • Strict Schema: Fixed 17-field output schema for Excel compatibility
  • Error Handling: Never breaks schema - returns empty structured rows on errors
  • Test Mode: Quick testing with mock data

Output Schema

Every output item contains exactly these 17 fields:

FieldDescription
titleTrip/tour name
destinationMain destination name
countryCountry name (AI-inferred if missing)
stateState/province/region (AI-inferred if missing)
cityCity name (AI-inferred if missing)
latitudeLatitude coordinate (AI-inferred from destination)
longitudeLongitude coordinate (AI-inferred from destination)
providerTravel company/organizer name
priceNumeric price value
currencyCurrency code (INR, USD, EUR, etc.)
start_dateStart date (YYYY-MM-DD format)
end_dateEnd date (YYYY-MM-DD format)
trip_typeTrip category (trek, backpacking, weekend, etc.)
descriptionBrief trip description
imagesComma-separated image URLs
inclusionsWhat's included (comma-separated)
booking_urlURL to book the trip

Input Configuration

{
"tripUrls":[
"https://example-travel.com/trip/himalayan-trek",
"https://example-travel.com/trip/goa-weekend"
],
"openaiApiKey":"sk-...",
"model":"gpt-4o-mini",
"testMode":false,
"maxConcurrency":5,
"requestTimeout":60000,
"usePlaywright":false
}

Input Parameters

ParameterTypeRequiredDefaultDescription
tripUrlsarrayYes-List of trip URLs to extract
openaiApiKeystringYes-OpenAI API key for enrichment
modelstringNogpt-4o-miniOpenAI model to use
testModebooleanNofalseReturn mock data without scraping
maxConcurrencyintegerNo5Max concurrent page loads
requestTimeoutintegerNo60000Page load timeout (ms)
usePlaywrightbooleanNofalseUse Playwright for JS-heavy pages

How It Works

Step 1: Fetch Data

For each URL, the actor loads the page and extracts:

  • Title, description, destination
  • Price, dates, duration
  • Itinerary, inclusions
  • Provider name, images, booking link

Step 2: Clean Data

  • Remove HTML tags
  • Normalize whitespace
  • Keep meaningful content only

Step 3: AI Enrichment

Send extracted content to OpenAI to:

  • Map data to required schema
  • Infer missing fields (country, state, city, trip_type)
  • Generate coordinates from destination
  • Normalize currency and dates

Step 4: Output

Each dataset item contains all 17 fields with missing values as empty strings.

Error Handling

  • Scraping fails: Returns empty structured row with booking_url
  • AI fails: Returns partially mapped data
  • Invalid URL: Returns empty structured row
  • Schema never breaks: All outputs have exactly 17 fields

Performance

  • Handles multiple URLs efficiently
  • Configurable concurrency
  • Timeout under 5 minutes per URL
  • Test mode for quick validation

Local Development

# Install dependencies
npminstall
# Run locally
npm start
# Or with Apify CLI
apify run

Deployment to Apify

# Login to Apify
apify login
# Push to Apify
apify push

Cost Estimation

  • Scraping: ~$0.01 per URL (Apify compute)
  • AI Enrichment: ~$0.001-0.01 per URL (depends on model)
    • gpt-4o-mini: Most cost-effective
    • gpt-4o: Higher quality, higher cost

License

ISC

You might also like

Trip Scraper

lhotanok/trip-scraper

This Trip Scraper will extract data for different types of accommodation from Trip.com website.

πŸ‘ User avatar

KristΓ½na LhoΕ₯anovΓ‘

46

Trip.com Email Scraper

scrapapi/tripcom-email-scraper

Trip.com Email Scraper - Advanced, Fast & Cheapest

contacts-api/trip-email-scraper-fast-advanced-and-cheapest

🧳 Trip.com Email Scraper helps you extract airline, hotel, and business emails from Trip.com sources πŸ”Ž Ideal for global travel and affiliate outreach πŸ“§

Trip.com Email Scraper

scraper-mind/trip-com-email-scraper

Extract verified Trip.com emails fast with the Trip.com Email Scraper! Search by keywords & location, filter custom domains, and export clean data for lead gen, outreach, or research. Proxy support included. Perfect for travel marketers & researchers!

17