VOOZH about

URL: https://apify.com/jaroslavhejlek/validate-dataset-with-json-schema

⇱ Validate Dataset(s) with JSON Schema Β· Apify


πŸ‘ Validate Dataset(s) with JSON Schema avatar

Validate Dataset(s) with JSON Schema

Pricing

Pay per usage

Go to Apify Store

Validate Dataset(s) with JSON Schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Jaroslav Hejlek

Jaroslav Hejlek

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

a year ago

Last modified

Share

This Apify Actor validates items in one or more datasets against a provided JSON Schema. It helps identify invalid items and provides detailed validation errors for each item that doesn't match the schema.

Features

  • Validate multiple datasets in a single run
  • Support for both Dataset IDs and Run IDs
  • Detailed validation error reporting
  • Uses JSON Schema Draft-07

Input

The actor accepts the following input parameters:

{
"datasetIds": ["datasetId1", "datasetId2"], // Array of Dataset IDs or Run IDs
"schema": { // JSON Schema to validate against
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
// Your schema properties here
},
"required": []
}
}

Input Parameters Details

  • datasetIds (required, array of strings)
    • List of Dataset IDs or Run IDs to validate
    • You can use either Dataset ID (e.g., "1234567890") or Run ID (e.g., "yourRunId") Λ‡
  • schema (required, object)
    • JSON Schema definition that describes the expected structure of items
    • Must be a valid JSON Schema (Draft-07)
    • Provided as a object in the input

Output

The actor stores validation results in its default dataset. Each record in the output dataset has the following structure:

{
"datasetId": "string", // ID of the dataset being validated
"itemPosition": "number", // Position of the invalid item in the dataset (0-based)
"validationErrors": [ // Validation errors from AJV validator
// Detailed error information for each error
]
}

Only invalid items (those that don't match the schema) are included in the output.

Usage Example

{
"datasetIds": ["abc123xyz789"],
"schema": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"url": { "type": "string", "format": "uri" },
"title": { "type": "string" },
"price": { "type": "number" }
},
"required": ["url", "title"]
}
}

Limitations

  • Can be slower for very large datasets since validation is done sequentially one item at a time
  • Maximum of 1000 validation errors are stored in memory before being pushed to the output dataset
  • The actor validates against JSON Schema Draft-07
  • Input schema must be a valid JSON schema

Dependencies

  • Node.js 20+
  • Ajv for JSON Schema validation
  • Apify SDK for Apify platform integration

You might also like

Dataset(s) To Schema

zuzka/dataset-to-schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

πŸ‘ User avatar

Zuzka PelechovΓ‘

8

5.0

Structured Data Extractor β€” URL to JSON

shelvick/structured-extractor

Extract structured data from a batch of URLs as schema-validated JSON. Send web pages and a JSON Schema; it scrapes each (stealth + residential proxy as needed), runs an LLM to convert the page to JSON matching your schema, and validates per URL. Omit schema for best-effort. Public pages only.

2

Dataset Quality Gate - Schema & Data QA

jy-labs/dataset-quality-gate

Validate Apify Datasets by pasted items, Dataset ID, or Run ID before delivery, automation, or AI/RAG ingestion. Catch schema drift, missing fields, duplicates, and bad URLs/emails/dates.

Schema Universal Converter

fiery_dream/schema-universal-converter

Convert between JSON Schema, TypeScript, Zod, OpenAPI, GraphQL, and more. Maintain schema consistency across your entire stack.

πŸ‘ User avatar

Cody Churchwell

2

Output & Dataset Schema Creator

zuzka/output-dataset-schema-creator

Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.

πŸ‘ User avatar

Zuzka PelechovΓ‘

1

Related articles

Why you should be using Actor schemas
Read more
Your Apify Actor's input schema is its UI. Here's how I design mine after 20+ Actors.
Read more
Dataset processing on Apify
Read more