VOOZH about

URL: https://apify.com/zuzka/output-dataset-schema-creator

⇱ Output & Dataset Schema Creator Β· Apify


Pricing

Pay per usage

Go to Apify Store

Output & Dataset Schema Creator

Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Zuzka PelechovΓ‘

Zuzka PelechovΓ‘

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

4 months ago

Last modified

Categories

Share

Dataset Schema External Actor

Automatically generate JSON schemas for any Apify actor's dataset output using AI. Perfect for actors without production data or when testing new actors.

What it does

This actor:

  1. Generates test inputs using AI (Claude Sonnet 4) based on the target actor's INPUT_SCHEMA
  2. Runs the target actor with multiple input variants (minimal, normal, maximal, edge cases)
  3. Analyzes the output datasets to generate a comprehensive JSON Schema
  4. Enhances the schema with AI-generated descriptions and examples
  5. Creates both schemas:
    • Dataset Schema: Validates the structure of items in your dataset (fields, types, required properties)
    • Output Schema: Defines what your actor returns (dataset, key-value store, etc.) and how it's displayed in Apify Console

When to use this actor

  • Testing new actors before they have production data
  • External actors you don't own but want to understand their output
  • Rapid prototyping when you need schemas quickly
  • Actors without production runs or insufficient data

Input

{
"actorTechnicalName":"api-ninja/tripadvisor-reviews-scraper",
"generateInputs":true,
"enhanceSchema":true,
"generateViews":false
}

Parameters

  • actorTechnicalName (required): The actor to analyze (e.g., username/actor-name)
  • generateInputs (optional, default: true): Generate test inputs with AI
  • existingMinimalInput (optional): Provide your own minimal test input (JSON string)
  • existingNormalInput (optional): Provide your own normal test input (JSON string)
  • existingMaximalInput (optional): Provide your own maximal test input (JSON string)
  • existingEdgeInput (optional): Provide your own edge case input (JSON string)
  • enhanceSchema (optional, default: true): Enhance schema with AI descriptions
  • existingEnhancedSchema (optional): Skip generation and use existing schema (JSON string)
  • generateViews (optional, default: false): Generate dataset views

Output

The primary output is the Schemas Bundle (shown first in results), which contains:

{
"schemas":{
"dataset":{
"title":"Dataset Schema",
"description":"Validates the structure of items in your dataset (fields, types, required properties)",
"schema":{/* Complete JSON Schema */}
},
"output":{
"title":"Output Schema",
"description":"Defines what your actor returns (dataset/KV store) and how it displays in Apify Console",
"schema":{/* OUTPUT_SCHEMA.json format */}
}
},
"metadata":{
"actorName":"tripadvisor-reviews-scraper",
"generatedAt":"2026-02-06T...",
"enhancementUsed":true,
"inputsUsed":"generated"
},
"usage":{
"datasetSchemaPath":".actor/dataset_schema.json",
"outputSchemaPath":".actor/output_schema.json",
"instructions":"Copy the schemas to your actor repository"
}
}

How to use the output

  1. View the Schemas Bundle (default output in Apify Console)
  2. Copy the dataset schema from schemas.dataset.schema
  3. Copy the output schema from schemas.output.schema
  4. Save to your actor:
    • Dataset schema β†’ .actor/dataset_schema.json
    • Output schema β†’ .actor/output_schema.json
  5. Update actor.json to reference these schemas

How it works

Step 1: Smart Input Generation

  • Fetches the target actor's INPUT_SCHEMA from Apify API
  • Extracts prefill/default values as a base
  • Uses Claude Sonnet 4 to generate 4 input variants:
    • Minimal: Essential fields only, 3 items max
    • Normal: Common use case with realistic data
    • Maximal: All available fields and options
    • Edge: Invalid/nonexistent data to test error handling

Step 2: Dataset Collection

  • Runs the target actor 4 times in parallel with different inputs
  • Collects output datasets from successful runs
  • Validates that at least 1 run succeeds (need data to generate schema)

Step 3: Schema Generation

  • Analyzes all dataset items to determine field types
  • Calculates presence percentages for each field
  • Determines required vs optional fields
  • Identifies array items, nested objects, enums
  • Generates complete JSON Schema (draft-07)

Step 4: AI Enhancement

  • Uses Claude Sonnet 4 to add:
    • Human-readable descriptions for each field
    • Example values based on actual data
    • Pattern validation rules
    • Better field titles and documentation

Step 5: Output Schema Creation

  • Generates Apify OUTPUT_SCHEMA.json format
  • Defines what the actor returns (dataset, key-value store)
  • Specifies how results display in Apify Console
  • Ready to use in actor.json

Example Use Cases

Test a scraper you're building

{
"actorTechnicalName":"my-username/my-new-scraper",
"generateInputs":true,
"enhanceSchema":true
}

Understand an external actor's output

{
"actorTechnicalName":"apify/google-search-scraper",
"generateInputs":true,
"enhanceSchema":true
}

Use custom test inputs

{
"actorTechnicalName":"username/actor-name",
"generateInputs":false,
"existingMinimalInput":"{\"query\": \"test\"}",
"existingNormalInput":"{\"query\": \"test\", \"maxResults\": 100}"
}

Limitations

  • Requires the target actor to have a valid INPUT_SCHEMA
  • Generated inputs may not always be perfect (depends on schema quality)
  • Costs: Runs the target actor 4 times (uses their compute units)
  • AI enhancement requires OpenRouter credits (Apify provides via APIFY_TOKEN)

Tips for best results

  1. Check the target actor's INPUT_SCHEMA first to ensure it exists and is complete
  2. Provide custom inputs if AI generation fails (some actors have complex requirements)
  3. Use generateViews: false unless you specifically need dataset views
  4. Review the generated schema before using in production
  5. Run multiple times if you want schemas from different data samples

Understanding Dataset Schema vs Output Schema

Dataset Schema (dataset_schema.json)

Validates the structure of individual items in your dataset:

  • What fields exist (e.g., title, price, url)
  • Field types (string, number, boolean, array, object)
  • Which fields are required vs optional
  • Validation rules (patterns, min/max values)

Example use: Ensure all scraped items have required fields before processing

Output Schema (output_schema.json)

Defines what your actor returns and how it displays:

  • Return type: dataset, key-value store, or both
  • Display format in Apify Console
  • Links to view results
  • Metadata structure

Example use: Show users where to find their scraped data in the UI

Support

For questions or feature requests, contact Apify support through the Apify Console.


Powered by Claude Sonnet 4 β€’ Built by Apify

You might also like

Dataset(s) To Schema

zuzka/dataset-to-schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

πŸ‘ User avatar

Zuzka PelechovΓ‘

8

5.0

Output to Dataset

njoylab/apify-output-to-dataset

Merges outputs from multiple actors into a single dataset. Execute actors in series or parallel, combine data from datasets, key-value stores, webhooks, and export the final output in various formats.

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

πŸ‘ User avatar

Jaroslav Hejlek

5

Data.gov.uk Scraper - Cheap πŸŒπŸ“ŠπŸ‡¬πŸ‡§

scrapestorm/data-gov-uk-scraper---cheap

πŸ”Ž Easily collect dataset listings from data.gov.uk Provide one or multiple search URLs and extract dataset information such as πŸ“„ Dataset Title 🏒 Published By πŸ•’ Last Updated πŸ“ Description πŸ”— Dataset URL & more Perfect for open data research, government data monitoring & dataset discovery πŸ“ŠπŸš€

1

5.0

πŸŽ‰ Apify Actors

prog-party/apify-actors

This Apify Actors Actor retrieves data from Apify, allowing to filter, and returns a list of actors as a Dataset.

LLM Dataset Processor

dusan.vystrcil/llm-dataset-processor

Allows you to process output of other actors or stored dataset with single LLM prompt. It's useful if you need to enrich data, summarize content, extract specific information, or manipulate data in a structured way using AI.

πŸ‘ User avatar

Duőan Vystrčil

149

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

Related articles

Why you should be using Actor schemas
Read more
Your AI agent used to guess what Actors return. Now it knows before running them.
Read more
How to configure your MCP server with 25,000 Apify Actors
Read more