VOOZH about

URL: https://apify.com/keratogenous_surgeon/dataset-ai-cleaner

โ‡ฑ AI Data Cleaner and Classifier for JSON, CSV, and Datasets ยท Apify


Pricing

from $0.01 / 1,000 results

Go to Apify Store

ai-data-cleaner-classifier

Clean, normalize, deduplicate, and classify JSON, CSV, or Apify datasets using rules or OpenAI models. Built for automation pipelines, data preparation, and AI workflows. Supports dataset chaining, cost controls, and safe fallbacks.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ King Shepherd

King Shepherd

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

6 months ago

Last modified

Share

AI Data Cleaner & Classifier

Clean, normalize, deduplicate, and classify structured data using rules or AI.

This Actor helps you turn messy JSON, CSV files, or Apify datasets into clean, structured, and usable data for automation pipelines, analytics, CRMs, and AI workflows.


โœ… What this Actor does

This Actor can process structured records and:

  • Normalize common fields (email, phone, name, company, URL)
  • Deduplicate records safely (hash-based)
  • Classify records using:
    • Rule-based logic
    • OpenAI models (optional)
  • Enrich data with:
    • Suggested tags
    • Industry (when detectable)
    • Confidence scores
  • Output clean, structured JSON to an Apify dataset

It is designed for automation and repeat usage, not one-off demos.


๐Ÿ“ฅ Supported input sources (exactly one required)

You must provide one and only one of the following input sources:

1๏ธโƒฃ Inline JSON data

{
"data":[
{"email":"test@example.com","company":"Acme Inc"}
]
}

You might also like

PDF AI Extractor MCP

devaditya/pdf-ai-extractor-mcp

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

Data Cleaner

parsebird/data-cleaner

Clean messy data โ€” remove nulls, normalize case, trim whitespace, format phone numbers and emails, extract domains, convert types, and more. Works with Apify datasets or direct JSON input.

Mastra.ai MCP Agent

jakub.kopecky/actor-mastra-mcp-agent

๐Ÿค– AI agent using mastra.ai with Apify MCP Server. ๐Ÿš€ Runs queries via OpenAI models, taps Apify Actors for web data, and outputs to datasets. ๐Ÿ› ๏ธ

๐Ÿ‘ User avatar

Jakub Kopeckรฝ

62

SmartData Executor

professional_jostle/SmartData-executor

Run structured data processing on CSV or JSON files. Clean, filter, aggregate, and transform datasets using simple parameters. Designed for analysts, automation workflows, and ETL pipelines. Outputs results as Apify Datasets with execution metadata.

Fast Dataset Cleaner & CSV Formatter

motivational_nickel/dataset-cleaner-and-formatter

Fast dataset cleaning for CSV and JSON files. Automatically removes duplicates, trims whitespace, fixes capitalization, and normalizes fields. Works with Apify datasets or uploaded files and prepares data for analytics, CRM imports, and automation pipelines.

๐Ÿ‘ User avatar

Leoncio Jr Coronado

6

Tiktok Scraper Ninja

jocadev/tiktok-scraper-ninja

Scrape public TikTok data from usernames, hashtags, or video URLs. Extract videos, descriptions, stats, and metadata into clean, structured datasets. Built for automation, research, and AI workflows. Fast, reliable, and production-ready.

Sentiment and Topics Text Classifier

lofomachines/sentiment-and-topics-text-classifier

Super Fast - Classify texts using AI. Paste texts in bulk, define your labels, Sentiment, and get classified results as a dataset. Use it to make text classifications on Tweets, Reviews, and more.

16

Related articles

What is AI web scraping? And do you really need it?
Read more
Mastering AI for data analysis: a comprehensive guide
Read more