VOOZH about

URL: https://apify.com/jungle_synthesizer/synthetic-ecommerce-data-generator

⇱ Synthetic E-Commerce Data Generator Β· Apify


πŸ‘ Synthetic E-Commerce Data Generator avatar

Synthetic E-Commerce Data Generator

Pricing

Pay per event

Go to Apify Store

Synthetic E-Commerce Data Generator

Generate realistic e-commerce test data with interconnected products, customers, orders, and reviews. Features referential integrity, realistic distributions, temporal coherence, industry presets, and deterministic seed mode.

Pricing

Pay per event

Rating

0.0

(0)

Developer

πŸ‘ BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

11 days ago

Last modified

Share

Synthetic E-Commerce Data Generator β€” Fake Product, Customer, and Order Data

Generate realistic e-commerce test datasets and mock sample data with four interconnected entity types: fake products, customers, orders, and reviews. All entities maintain referential integrity β€” orders reference real product and customer IDs, reviews reference real products and customers. Timestamps maintain temporal coherence: orders are placed after customer registration, shipments follow order placement, deliveries follow shipment.

Features

  • Four entity types with cross-references: products, customers, orders, reviews
  • Referential integrity β€” every order and review links to real product and customer IDs generated in the same run
  • Realistic statistical distributions: log-normal product prices, right-skewed review ratings (average ~4.2), weighted order statuses (70% delivered)
  • Temporal coherence β€” shipped_at always follows ordered_at, delivered_at follows shipped_at, orders are placed after customer registration
  • Five industry presets with tailored categories, brand names, and price ranges
  • Deterministic seed mode for reproducible datasets
  • Five locale options for names, addresses, and phone numbers
  • No network calls, no proxy needed β€” pure CPU data generation

Who Uses This Fake E-Commerce Test Data Generator and Why

  • E-commerce developers β€” populate Shopify, WooCommerce, or Magento staging environments with realistic test data before launch
  • Data engineers β€” validate ETL pipelines with known-schema e-commerce records that include edge cases (zero-order customers, cancelled orders, one-star reviews)
  • Analytics teams β€” build and demo dashboards with realistic order volumes, customer segments, and product catalogs without exposing production data
  • QA engineers β€” stress-test order processing systems with thousands of orders referencing real product inventories and customer accounts
  • Bootcamp instructors β€” provide students with clean, well-structured datasets for SQL exercises, pandas workshops, and data visualization projects

How It Works

  1. You configure how many products, customers, orders, and reviews to generate, pick an industry preset and locale, and optionally set a random seed.
  2. The generator creates products first (with industry-specific categories, brands, and log-normal price distributions), then customers (with segment-weighted lifetime values), then orders (referencing real products and customers, with calculated totals and temporal timestamps), then reviews (with rating-appropriate text templates and referential links).
  3. In unified mode, all entities go to one dataset with an entityType field. In separate mode, products go to the dataset and other entities are saved as JSON in the key-value store.
  4. The maxItems cap is applied after generation to limit total output size.

Input

Default run β€” 100 mixed records

{
"numProducts":20,
"numCustomers":30,
"numOrders":50,
"numReviews":40,
"maxItems":100,
"industry":"general",
"locale":"en",
"outputFormat":"unified"
}

Electronics dataset with deterministic seed

{
"numProducts":50,
"numCustomers":100,
"numOrders":200,
"numReviews":150,
"maxItems":0,
"industry":"electronics",
"seed":42,
"outputFormat":"unified"
}

Fashion products only (separate mode)

{
"numProducts":100,
"numCustomers":50,
"numOrders":80,
"numReviews":60,
"maxItems":0,
"industry":"fashion",
"outputFormat":"separate"
}

Input Reference

FieldTypeDefaultDescription
numProductsinteger20Number of product records to generate (1–10,000)
numCustomersinteger30Number of customer records to generate (1–50,000)
numOrdersinteger50Number of order records to generate (0–100,000)
numReviewsinteger40Number of review records to generate (0–100,000)
maxItemsinteger100Maximum total records across all entity types. Set to 0 for no limit
industrystringgeneralIndustry preset: general, fashion, electronics, grocery, home_goods
localestringenLocale for names and addresses: en, de, fr, ja, es
seedintegernullRandom seed for deterministic output. Omit for random data each run
outputFormatstringunifiedunified puts all entities in one dataset. separate puts only products in the dataset and saves the rest to the key-value store

Output

Product record

{
"entityType":"product",
"product_id":"PROD-00001",
"product_name":"Premium Laptops X7K",
"sku":"SKU-RSJ7NHY5",
"brand":"NovaTech",
"category":"Electronics",
"subcategory":"Laptops",
"price":54.56,
"cost":34.63,
"weight_kg":5.68,
"rating_avg":4.2,
"review_count":140,
"in_stock":true,
"created_at":"2024-03-23T03:33:06.557Z"
}

Customer record

{
"entityType":"customer",
"customer_id":"CUST-00001",
"first_name":"Bonita",
"last_name":"Tremblay",
"email":"bonita.tremblay@hotmail.com",
"phone":"(983) 829-9005",
"address":"5836 E Main Street",
"city":"Flagstaff",
"state":"VT",
"zip":"75793-8196",
"country":"US",
"customer_created_at":"2024-03-28T15:21:19.313Z",
"lifetime_value":2450.75,
"order_count":12,
"segment":"returning"
}

Order record

{
"entityType":"order",
"order_id":"ORD-00001",
"order_customer_id":"CUST-00017",
"product_ids":"PROD-00007, PROD-00013, PROD-00002",
"quantities":"1, 2, 1",
"subtotal":326.41,
"tax":28.97,
"shipping":0,
"total":355.38,
"order_status":"delivered",
"ordered_at":"2025-06-22T10:21:51.493Z",
"shipped_at":"2025-06-26T10:21:51.493Z",
"delivered_at":"2025-06-28T10:21:51.493Z"
}

Review record

{
"entityType":"review",
"review_id":"REV-00001",
"review_product_id":"PROD-00003",
"review_customer_id":"CUST-00012",
"review_rating":5,
"review_title":"Love it!",
"review_body":"Absolutely love this Premium Tablets A3M! The build quality is outstanding. Would definitely buy again.",
"helpful_count":7,
"verified_purchase":true,
"reviewed_at":"2025-08-15T14:30:22.100Z"
}

Industry Presets

PresetCategoriesPrice RangeExample Brands
generalElectronics, Clothing, Home & Kitchen, Sports, Books$5–$500Apex, NovaTech, Zenith
fashionWomen's Clothing, Men's Clothing, Shoes, Accessories, Sportswear$15–$800Luxe & Co, Urban Thread, Maison Noir
electronicsComputers, Mobile, Audio, Smart Home, Gaming$10–$2,500TechVault, PixelForge, Quantum
groceryFresh Produce, Dairy & Eggs, Bakery, Beverages, Pantry$1–$50Green Valley, Harvest Moon, Farm Fresh
home_goodsFurniture, Decor, Kitchen, Bedding, Garden$8–$1,200HomeStead, Craftwell, Willow & Oak

Performance

This actor generates data in-memory with no network calls. Approximate run times:

  • 100 records: < 1 second
  • 1,000 records: 1–2 seconds
  • 10,000 records: 5–10 seconds
  • 100,000 records: 30–60 seconds

Memory usage stays under 256MB for datasets up to 100,000 records.

FAQ

How do I generate fake e-commerce data for testing?

Set the record counts (numProducts, numCustomers, numOrders, numReviews), pick an industry preset and locale, and run. The output is a dataset of mock products, customers, orders, and reviews with referential integrity, downloadable as JSON, CSV, or Excel.

Can I create reproducible mock datasets with the same seed?

Yes. Set the seed field to any integer and the generator produces identical output across runs β€” useful for deterministic test fixtures and CI snapshots. Leave seed empty for fresh random data each run.

Does this generate dummy customer and order data with realistic distributions?

Yes. Product prices follow a log-normal distribution, review ratings are right-skewed (average around 4.2), and order statuses are weighted (about 70% delivered). Every order and review references real product and customer IDs from the same run.

Need More Features?

If you need additional entity types (inventory, shipping carriers, promotions), custom field mappings, or integration with specific e-commerce platforms, file an issue or get in touch. We are always open to extending the generator to suit your needs.

You might also like

Synthetic Dataset Generator

jungle_synthesizer/synthetic-dataset-generator

Generate realistic synthetic datasets with correlated fields, built-in presets (user profiles, companies, e-commerce products, log events), custom schemas, deterministic seeding, and multiple output formats (JSON, CSV, NDJSON).

πŸ‘ User avatar

BowTiedRaccoon

5

Synthetic Financial Data Generator

jungle_synthesizer/synthetic-financial-data-generator

Generate realistic synthetic financial transaction data with category-aware amounts, temporal spending patterns, running balances, and configurable fraud labels for ML training and fintech testing

πŸ‘ User avatar

BowTiedRaccoon

4

Synthetic Data Generator

web.harvester/synthetic-data-generator

Generate realistic fake data for testing and development. Create profiles, addresses, companies, and transactions using Faker. 50+ locales, deterministic mode, custom schemas.

4

E-commerce Scraping Tool

apify/e-commerce-scraping-tool

Scrape data from e-commerce websites with E-commerce Scraping Tool. Scrape almost any retail site in minutes, extract e-commerce data and use it to monitor price details over time or compare different e-commerce sites’ offerings.

Ecommerce-Product-Scraper

digicovai/ecommerce-product-scraper

Scrape data from e-commerce websites with E-commerce Scraping Tool. Scrape almost any retail site in minutes, extract e-commerce data and use it to monitor price details over time or compare different e-commerce sites’ offerings.

E-commerce Product Matching Tool

tri_angle/e-commerce-product-matching-tool

Match products across e-commerce datasets with E-Commerce Product Matching Tool. Use it with E-commerce Scraping Tool datasets to automatically find identical and similar products and power price monitoring or catalog comparison.

πŸ‘ User avatar

Tri⟁angle

1

SyntheticFlow API - LLM-Powered Contextual Data Generator

fresh_cliff/syntheticflow-api---llm-powered-contextual-data-generator

Generate AI-powered synthetic data with LLM intelligence for business contexts. Create realistic customer profiles, documents, market data for AI agents. Privacy-compliant, multimodal, trend-aware synthetic data generation.

πŸ‘ User avatar

Brennan Crawford

2

E-commerce Email Scraper - Low-costπŸ’²πŸ”₯πŸ”πŸ›’

delectable_incubator/e-commerce-email-scraper-low-cost

Scrape e-commerce contacts and store data πŸ”πŸ›’ with a powerful email scraper. Extract verified seller emails, contacts, product titles, store descriptions, and source links using keywords, domains, or platforms. Ideal for B2B lead generation, outreach campaigns and e-commerce market intelligence πŸ“Š

E-Commerce Phone Number Scraper – Cheapest πŸ›οΈπŸ“ž (All-in-One)

scrapestorm/e-commerce-phone-number-scraper---cheapest-all-in-one

πŸ” Scrape Mass/Bulk E-Commerce Phone Numbers Easily Enter your search parameters (e.g. keyword & platform) to collect verified business phone numbers from top e-commerce sites and marketplaces πŸ›οΈπŸ“ž Ideal for lead generation, dropshipping research, seller outreach, and data enrichment βš‘πŸ“Š

49

3.8