Pricing
from $0.05 / result
Go to Apify Store
Group Trip Data Extractor
Extract structured trip information from multiple group trip URLs and enrich data using AI
Pricing
from $0.05 / result
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
2 months ago
Last modified
Share
A production-ready Apify Actor that extracts structured trip information from multiple group trip URLs and enriches data using AI.
Project Structure
AI Group trip extractor/βββ .actor/β βββ actor.json # Actor configurationβ βββ input_schema.json # Input parameter definitionsβ βββ output_schema.json # Output schema forAPI/Consoleβ βββ dataset_schema.json # Dataset field definitionsβββ src/β βββ main.js # Core actor logicβ βββ scraper.js # Web scraping(Cheerio/Playwright)β βββ ai-enricher.js # OpenAI integrationβ βββ schema.js # Schema validation utilitiesβββ Dockerfile # Docker build configurationβββ package.json # Dependencies & scriptsβββ INPUT.json # Sample input for testing
Features
- Multi-URL Processing: Process multiple trip URLs in a single run
- Intelligent Scraping: Uses Cheerio (fast) or Playwright (JavaScript-heavy pages)
- AI Enrichment: Uses OpenAI to infer missing data (coordinates, country, city, trip type)
- Strict Schema: Fixed 17-field output schema for Excel compatibility
- Error Handling: Never breaks schema - returns empty structured rows on errors
- Test Mode: Quick testing with mock data
Output Schema
Every output item contains exactly these 17 fields:
| Field | Description |
|---|---|
title | Trip/tour name |
destination | Main destination name |
country | Country name (AI-inferred if missing) |
state | State/province/region (AI-inferred if missing) |
city | City name (AI-inferred if missing) |
latitude | Latitude coordinate (AI-inferred from destination) |
longitude | Longitude coordinate (AI-inferred from destination) |
provider | Travel company/organizer name |
price | Numeric price value |
currency | Currency code (INR, USD, EUR, etc.) |
start_date | Start date (YYYY-MM-DD format) |
end_date | End date (YYYY-MM-DD format) |
trip_type | Trip category (trek, backpacking, weekend, etc.) |
description | Brief trip description |
images | Comma-separated image URLs |
inclusions | What's included (comma-separated) |
booking_url | URL to book the trip |
Input Configuration
{"tripUrls":["https://example-travel.com/trip/himalayan-trek","https://example-travel.com/trip/goa-weekend"],"openaiApiKey":"sk-...","model":"gpt-4o-mini","testMode":false,"maxConcurrency":5,"requestTimeout":60000,"usePlaywright":false}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tripUrls | array | Yes | - | List of trip URLs to extract |
openaiApiKey | string | Yes | - | OpenAI API key for enrichment |
model | string | No | gpt-4o-mini | OpenAI model to use |
testMode | boolean | No | false | Return mock data without scraping |
maxConcurrency | integer | No | 5 | Max concurrent page loads |
requestTimeout | integer | No | 60000 | Page load timeout (ms) |
usePlaywright | boolean | No | false | Use Playwright for JS-heavy pages |
How It Works
Step 1: Fetch Data
For each URL, the actor loads the page and extracts:
- Title, description, destination
- Price, dates, duration
- Itinerary, inclusions
- Provider name, images, booking link
Step 2: Clean Data
- Remove HTML tags
- Normalize whitespace
- Keep meaningful content only
Step 3: AI Enrichment
Send extracted content to OpenAI to:
- Map data to required schema
- Infer missing fields (country, state, city, trip_type)
- Generate coordinates from destination
- Normalize currency and dates
Step 4: Output
Each dataset item contains all 17 fields with missing values as empty strings.
Error Handling
- Scraping fails: Returns empty structured row with booking_url
- AI fails: Returns partially mapped data
- Invalid URL: Returns empty structured row
- Schema never breaks: All outputs have exactly 17 fields
Performance
- Handles multiple URLs efficiently
- Configurable concurrency
- Timeout under 5 minutes per URL
- Test mode for quick validation
Local Development
# Install dependenciesnpminstall# Run locallynpm start# Or with Apify CLIapify run
Deployment to Apify
# Login to Apifyapify login# Push to Apifyapify push
Cost Estimation
- Scraping: ~$0.01 per URL (Apify compute)
- AI Enrichment: ~$0.001-0.01 per URL (depends on model)
- gpt-4o-mini: Most cost-effective
- gpt-4o: Higher quality, higher cost
License
ISC
