Pricing
Pay per usage
Dataset(s) To Schema
Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.
Pricing
Pay per usage
Rating
5.0
(1)
Developer
Actor stats
0
Bookmarked
8
Total users
0
Monthly active users
6 months ago
Last modified
Categories
Share
Dataset to Schema
Generates a JSON Schema from one or more datasets on Apify. The actor scans dataset items, detects data types for each field (including merging multiple types), and outputs the resulting schema:
- Saves it to the KeyβValue Store under the key
SCHEMA(asapplication/json), - Also pushes the same schema as an item to the runβs output dataset for convenient viewing or sharing.
Use case: validating scraper outputs, generating OpenAPI/validators, or quickly checking data consistency across multiple datasets.
Input (input schema)
{"title":"Generate schema from datasets","type":"object","schemaVersion":1,"properties":{"datasetIds":{"title":"Dataset IDs","type":"array","description":"IDs of the datasets for which to generate a schema","editor":"stringList"}},"required":["datasetIds"]}
Fields
datasetIds(array β list of Apify dataset IDs to include in schema generation. You can provide one or multiple IDs; the actor iterates through them and merges schemas together.
Output
The actor produces the same schema in two places:
- KeyβValue Store: key
SCHEMAβ complete JSON Schema file (e.g.,schema.json). - Output dataset: a single item containing the full schema (for quick preview in the console).
Example output schema (truncated)
{"$schema":"http://json-schema.org/draft-07/schema#","type":"object","properties":{"title":{"type":["string","null"]},"price":{"type":["number","string"]},"inStock":{"type":"boolean"},"images":{"type":"array","items":{"type":"string"}}},"additionalProperties":true}
Note: The actor merges multiple observed types into union types (e.g.,
"type": ["number", "string"]) when data varies.
How It Works
- Reads
datasetIdsfrom the input. - Iterates through each dataset and detects field types:
number,string,boolean,object,array(unifying differing values into union types if needed). - Merges all detected fields into a single schema covering all datasets.
- Saves the final schema to the KV Store (
SCHEMA) and pushes it to the output dataset. - If a dataset exceeds internal iteration limits (β1β―M items), logs a warning that the schema may be incomplete but still completes the run.
Quick Start on Apify
-
Create a run of the actor in the Apify Console.
-
Provide input:
{"datasetIds":["abc123","def456"]} -
Run it. After completion, open Storage β KeyβValue Store and download
SCHEMA. Alternatively, open the output dataset to view the schema item.
Limitations & Edge Cases
- Large datasets (>β―~1β―M items): the actor logs a warning (βSchema might not be perfect.β) and continues. For higher accuracy, generate a schema from a smaller sample or preβaggregate data.
- Heterogeneous data: if fields vary widely, expect broader union types β this is intentional so the schema reflects observed variability.
