VOOZH about

URL: https://apify.com/parsebird/data-cleaner

โ‡ฑ Data Cleaner โ€” Clean, Normalize & Format Scraped Data ยท Apify


Pricing

from $1.49 / 1,000 items cleaneds

Go to Apify Store

Clean messy data โ€” remove nulls, normalize case, trim whitespace, format phone numbers and emails, extract domains, convert types, and more. Works with Apify datasets or direct JSON input.

Pricing

from $1.49 / 1,000 items cleaneds

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseBird

ParseBird

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

24 days ago

Last modified

Share

Data Cleaner

Clean messy data โ€” remove nulls, normalize case, trim whitespace, format phone numbers and emails, extract domains, convert types, and more. Works with Apify datasets or direct JSON input. The first general-purpose data cleaner on Apify.

Clean messy scraped data in one step โ€” trim whitespace, normalize casing, format phone numbers to E.164, lowercase emails, extract domains from URLs, convert strings to numbers, remove null rows, and deduplicate.

ParseBird Infra Suite   โ€ข  Utility tools for data pipelines
๐Ÿ”—  HTTP Request
Send API calls from the cloud
๐Ÿ“š  Data Deduplicator
Merge & deduplicate datasets by any field
๐Ÿ—ก  Data Cleaner
โžค You are here

Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

parsebird/data-cleaner on Apify. Call: ApifyClient("TOKEN").actor("parsebird/data-cleaner").call(run_input={...}), then client.dataset(run["defaultDatasetId"]).list_items().items for cleaned results. Key inputs: datasetId (string, Apify dataset ID), jsonData (array of objects, direct JSON input), operations (array of {field, action, options} โ€” required), outputDatasetId (string, optional), maxItems (integer, default 1000000). Actions: trim_whitespace, normalize_case (options: {case: "lower"|"upper"|"title"}), format_email, format_phone (options: {countryCode: "US"}), extract_domain, to_number, to_date, fill_nulls (options: {value: "..."}), remove_nulls, remove_duplicates, replace_value (options: {find, replace}). Full actor spec: fetch build via GET https://api.apify.com/v2/acts/parsebird~data-cleaner (Bearer TOKEN). Get token: https://console.apify.com/account/integrations

What does Data Cleaner do?

This Actor takes messy scraped or imported data and applies a configurable pipeline of cleaning operations. Each operation targets a specific field and transforms its values โ€” trimming whitespace, normalizing case, formatting phone numbers, and more.

Use cases:

  • CRM cleanup โ€” normalize names, emails, and phone numbers before import
  • Lead list hygiene โ€” remove rows with missing emails, deduplicate by company
  • Post-scrape processing โ€” extract domains from URLs, convert price strings to numbers
  • Data pipeline prep โ€” standardize data format before analysis or export

Supported operations

ActionDescriptionOptionsBeforeAfter
trim_whitespaceRemove leading/trailing spacesโ€”" John Doe ""John Doe"
normalize_caseConvert to lower/upper/title case{"case": "title"}"john doe""John Doe"
format_emailLowercase and trim emailsโ€”" JOHN@CO.COM ""john@co.com"
format_phoneNormalize to E.164 format{"countryCode": "US"}"(555) 123-4567""+15551234567"
extract_domainExtract domain from URL or emailโ€”"https://www.example.com/page""example.com"
to_numberConvert string to numberโ€”"$1,234,567"1234567
to_dateParse date to ISO 8601โ€”"March 15, 2024""2024-03-15T00:00:00"
fill_nullsReplace null/empty with default{"value": "N/A"}null"N/A"
remove_nullsRemove rows where field is null/emptyโ€”(row removed)โ€”
remove_duplicatesDeduplicate by this fieldโ€”(duplicate removed)โ€”
replace_valueFind and replace text{"find": "Inc.", "replace": "Inc"}"Acme Inc.""Acme Inc"

Input parameters

ParameterTypeRequiredDefaultDescription
datasetIdstringNo*โ€”Apify dataset ID to clean
jsonDataarrayNo*โ€”Direct JSON array of objects to clean
operationsarrayYesโ€”List of {field, action, options} cleaning operations
outputDatasetIdstringNoโ€”Named output dataset (defaults to run dataset)
maxItemsintegerNo1000000Max items to process

*Provide either datasetId or jsonData (or both).

Operations format

Each operation is a JSON object with:

{
"field":"email",
"action":"format_email",
"options":{}
}

Operations are applied in order. You can chain multiple operations on the same field:

[
{"field":"name","action":"trim_whitespace"},
{"field":"name","action":"normalize_case","options":{"case":"title"}},
{"field":"email","action":"format_email"},
{"field":"phone","action":"format_phone","options":{"countryCode":"US"}},
{"field":"website","action":"extract_domain"},
{"field":"revenue","action":"to_number"},
{"field":"email","action":"remove_nulls"}
]

Before and after example

Input (dirty data)

[
{"name":" john doe ","email":" JOHN@EXAMPLE.COM ","phone":"(555) 123-4567","website":"https://www.example.com/about","revenue":"$1,234,567"},
{"name":"JANE SMITH","email":"Jane.Smith@Company.IO","phone":"555.987.6543","website":"info@company.io","revenue":"2345678"},
{"name":"","email":null,"phone":"1-800-555-0199","website":"company.io","revenue":"$99.99"},
{"name":"bob wilson","email":"bob@test.com","phone":"+14155550100","website":"https://test.com/page?id=1","revenue":"not a number"}
]

Output (cleaned data)

[
{"name":"John Doe","email":"john@example.com","phone":"+15551234567","website":"example.com","revenue":1234567},
{"name":"Jane Smith","email":"jane.smith@company.io","phone":"+15559876543","website":"company.io","revenue":2345678},
{"name":"Bob Wilson","email":"bob@test.com","phone":"+14155550100","website":"test.com","revenue":"not a number"}
]

Row 3 was removed (null email with remove_nulls). All names are title-cased, emails lowercased, phones in E.164, domains extracted, and revenues converted to numbers.

How to use via API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parsebird/data-cleaner").call(run_input={
"datasetId":"YOUR_DATASET_ID",
"operations":[
{"field":"email","action":"format_email"},
{"field":"name","action":"trim_whitespace"},
{"field":"name","action":"normalize_case","options":{"case":"title"}},
{"field":"phone","action":"format_phone","options":{"countryCode":"US"}},
],
})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Cleaned items: {len(items)}")

cURL

curl-X POST "https://api.apify.com/v2/acts/parsebird~data-cleaner/runs?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"jsonData": [
{"name": " JOHN DOE ", "email": " JOHN@CO.COM "}
],
"operations": [
{"field": "name", "action": "trim_whitespace"},
{"field": "name", "action": "normalize_case", "options": {"case": "title"}},
{"field": "email", "action": "format_email"}
]
}'

Output

Cleaned items retain their original structure. A stats key is stored in the key-value store:

{
"totalLoaded":5000,
"totalCleaned":4800,
"operationsApplied":7,
"fieldsCleaned":5,
"totalChanges":15200
}

Pricing

This Actor uses a pay-per-event pricing model.

EventPrice per eventPrice per 1,000
items-cleaned$0.00149$1.49

Charged per 1,000 items loaded. Platform compute costs are additional.

You might also like

CRM Lead Data Cleaner (Email/Phone Validator + Dedup)

motivational_nickel/universal-data-cleaner

Turn messy CSV or Excel leads into clean, validated, CRM-ready data. Fix Excel E+11 phone numbers, validate emails, remove duplicates, and score lead quality (HIGH, MEDIUM, LOW). Built for sales teams, lead gen agencies, and automation workflows.

๐Ÿ‘ User avatar

Leoncio Jr Coronado

13

Superclean URLs

superlativetech/superclean-urls

Clean messy URLs from lead exports. Remove 60+ tracking parameters (utm_*, fbclid, gclid), normalize format, extract domains, and optionally verify URLs are reachable. Perfect for cold email personalization and CRM data hygiene.

Data Deduplicator

parsebird/dataset-deduplicator

Merge and deduplicate Apify datasets by any field combination. Remove duplicate rows while keeping the first or last occurrence. Supports case-insensitive matching and whitespace trimming.

Dataset Deduplicator

automation-lab/dataset-dedup

Merge and deduplicate Apify datasets by any field combination. Remove duplicates, keep first or last occurrence. Case-insensitive matching, whitespace trimming. Pay per 1K items processed.

๐Ÿ‘ User avatar

Stas Persiianenko

23

Fast Dataset Cleaner & CSV Formatter

motivational_nickel/dataset-cleaner-and-formatter

Fast dataset cleaning for CSV and JSON files. Automatically removes duplicates, trims whitespace, fixes capitalization, and normalizes fields. Works with Apify datasets or uploaded files and prepares data for analytics, CRM imports, and automation pipelines.

๐Ÿ‘ User avatar

Leoncio Jr Coronado

6

ai-data-cleaner-classifier

keratogenous_surgeon/dataset-ai-cleaner

Clean, normalize, deduplicate, and classify JSON, CSV, or Apify datasets using rules or OpenAI models. Built for automation pipelines, data preparation, and AI workflows. Supports dataset chaining, cost controls, and safe fallbacks.

3

Superclean Phone Numbers

superlativetech/superclean-phone-numbers

Format, parse, and clean phone numbers in bulk. Normalize to E.164, International, or National format. Validates numbers, detects type (mobile/landline/toll-free), handles vanity numbers like 1-800-FLOWERS, and extracts extensions. For CRM exports and lead data cleanup.