VOOZH about

URL: https://apify.com/taroyamada/csv-data-cleaner

⇱ Scraped Data CSV Cleaner - Deduplicate Leads & Profiles Β· Apify


πŸ‘ 🧼 Scraped Data CSV Cleaner avatar

🧼 Scraped Data CSV Cleaner

Pricing

Pay per event

Go to Apify Store

🧼 Scraped Data CSV Cleaner

Polish raw outputs from Google Maps and Instagram profile scrapers. Merge duplicate contacts, clear empty spreadsheet rows, and sort email lists automatically.

Pricing

Pay per event

Rating

0.0

(0)

Developer

πŸ‘ naoki anzai

naoki anzai

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

a month ago

Last modified

Share

🧹 CSV Data Cleaner

Extracting profile data from social networks often yields fragmented and repetitive spreadsheets. Whether you are running an Instagram profile scraper or pulling business locations from Google Maps, raw CSV exports frequently suffer from duplicate contact details, annoying trailing spaces, and completely empty rows. This CSV data cleaner is designed to instantly polish those raw datasets, converting messy browser outputs into clean, actionable contact lists.

Research teams and analytics pipelines depend on this automated data cleaner to prepare CSV datasets for downstream analysis. Instead of fighting with complex spreadsheet formulas to identify redundant entries, you can pass your raw CSV URL directly to this utility. It systematically scans the file to deduplicate rows based on specific columnsβ€”like usernames, email addresses, or phone numbersβ€”ensuring you never analyse duplicate rows as separate records.

Beyond basic deduplication, the tool actively sanitizes the content. It trims invisible whitespace from text fields, drops blank lines generated during interrupted scraping runs, and sorts the final list for easy review. By automating this cleanup phase, you ensure that every exported spreadsheet contains perfectly formatted data. Your final files will have clean website URLs, properly structured bios, and deduplicated social media posts, ready to fuel your next marketing campaign.

Store Quickstart

Start with the Quickstart template (direct CSV URL). For Apify pipelines, use Pipeline Cleaner with datasetId.

Key Features

  • 🧹 Trim whitespace β€” Remove leading/trailing spaces from all cells
  • πŸ—‘οΈ Remove empty rows β€” Drop rows where all columns are empty
  • πŸ” Deduplicate by columns β€” Remove duplicate rows by specified key columns
  • πŸ“Š Sort by column β€” Output sorted by any column
  • πŸ”— Dataset or URL input β€” Apify dataset ID or direct CSV URL
  • πŸ”‘ No API key needed β€” Pure JS, zero dependencies

Use Cases

WhoWhy
Data engineersClean scraper outputs before downstream processing
BI analystsStandardize CSV imports from multiple sources
Marketing opsClean analyst CSVs before downstream pipeline ingestion
Data migrationNormalize CSV files during system migrations
Apify pipelinesPost-process actor output datasets

Input

FieldTypeDefaultDescription
csvUrlstringDirect CSV URL (or use datasetId)
datasetIdstringApify dataset ID (or use csvUrl)
dedupColumnsstring[][]Columns for dedup key
trimWhitespacebooleantrueTrim whitespace
removeEmptybooleantrueRemove empty rows
sortBystringColumn to sort by

Input Example

{
"csvUrl":"https://example.com/data.csv",
"dedupColumns":["email"],
"trimWhitespace":true,
"removeEmpty":true,
"sortBy":"created_at"
}

Input Examples

Example: Type detection only

{
"datasetId":"abc123",
"detectTypesOnly":true
}

Example: Full cleanup pass

{
"datasetId":"abc123",
"trimWhitespace":true,
"normalizeNulls":true,
"dedupeRows":true
}

Example: Column-specific transformation

{
"datasetId":"abc123",
"transformations":[
{
"column":"email",
"op":"lowercase"
},
{
"column":"phone",
"op":"e164"
}
]
}

Output

FieldTypeDescription
rowNumberintegerOriginal row index
dataobjectCleaned row as key-value pairs
changesstring[]List of cleanings applied to this row
droppedbooleanWhether the row was removed
dropReasonstringnull

Output Example

{
"inputRows":1250,
"outputRows":1180,
"duplicatesRemoved":45,
"emptyRowsRemoved":25,
"cleanedData":[
{"email":"user1@example.com","name":"Alice","created_at":"2026-01-01"},
{"email":"user2@example.com","name":"Bob","created_at":"2026-01-02"}
]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console β†’ Settings β†’ Integrations.

cURL

curl-X POST "https://api.apify.com/v2/acts/taroyamada~csv-data-cleaner/run-sync-get-dataset-items?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{ "csvUrl": "https://example.com/data.csv", "dedupColumns": ["email"], "trimWhitespace": true, "removeEmpty": true, "sortBy": "created_at" }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/csv-data-cleaner").call(run_input={
"csvUrl":"https://example.com/data.csv",
"dedupColumns":["email"],
"trimWhitespace": true,
"removeEmpty": true,
"sortBy":"created_at"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_API_TOKEN'});
const run =await client.actor('taroyamada/csv-data-cleaner').call({
"csvUrl":"https://example.com/data.csv",
"dedupColumns":["email"],
"trimWhitespace":true,
"removeEmpty":true,
"sortBy":"created_at"
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Set removeDuplicates: true to deduplicate based on all columns.
  • Use delimiter to handle TSV (\t) or semicolon-separated files.
  • Combine with Phone Validator and Email Checker for full lead-data cleansing.
  • Output dataset is ready for direct import into CRMs or databases.

FAQ

What CSV dialects are supported?

Standard RFC 4180 CSV: comma-delimited, quoted fields, CRLF line endings. TSV not supported directly.

Max CSV file size?

In-memory processing. Works well up to ~100 MB / 1M rows. Larger files need chunking.

Does it validate data types?

No β€” cleaning operations only. For type validation, combine with validation libraries.

Can I use this in Apify pipelines?

Yes β€” provide datasetId from a prior actor run to clean that dataset directly.

What's the max file size?

Limited by actor memory (1024 MB by default). Tested up to 100k rows.

Can I upload a local CSV?

Provide a public URL via csvUrl. Use a service like file.io or S3 presigned URLs for private files.

Related Actors

DevOps & Tech Intel cluster β€” explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.001 per output item

Example: 1,000 items = $0.01 + (1,000 Γ— $0.001) = $1.01

No subscription required β€” you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a β˜… rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

You might also like

Scraped Data Cleaner & Converter (No-Code CSV/JSON Tool) Rental

m3web/scraped-data-cleaner-rental

Clean and organize scraped .json or .csv data β€” no coding required. Remove duplicates, empty rows, unwanted columns, and sort by any field. Cleaned results are pushed to your Apify dataset. Perfect for marketers, researchers, and no-code workflows.

Scraped Data Cleaner & Converter (No-Code CSV/JSON Tool) - PPE

m3web/scraped-data-cleaner-ppe

Clean and organize scraped .json or .csv data β€” no coding required. Remove duplicates, empty rows, unwanted columns, and sort by any field. Cleaned results are stored in Apify's Key-Value Store. Perfect for marketers, researchers, and no-code workflows.

Whatsapp Scraper Profile Bulk, CSV, SpreadSheet or Input file

antonio_cesar/whatsapp-scraper-profile-bulk

Whatsapp Bulk Scraper Profile send your spreadsheet or CSV file and scrape

πŸ‘ User avatar

AntΓ΄nio CΓ©sar

399

Google Maps Email Extractor

ayeeyee/google-maps-email-extractor

Email extraction from Maps

πŸ‘ User avatar

Virtual Footprint LLC

2

Google Maps Email Scraper

scraper-engine/google-maps-email-scraper

Google Maps Email Scraper extracts publicly available business email addresses from Google Maps listings. Build targeted contact lists by location, category, or rating. Ideal for sales teams and local marketers.

πŸ‘ User avatar

Scraper Engine

47

5.0

πŸ—ΊοΈ Google Maps B2B Email Scraper

simpleapi/googlemaps-b2b-emails-scraper

πŸ—ΊοΈ Google Maps B2B Email Scraper extracts verified business emails, names & contacts from Google Maps listings. πŸš€ Boost lead gen & outreach for sales, marketing & recruitment with fast, targeted data. πŸ“§ Get B2B contacts in minutes!

Enrich Google Maps Dataset with Contacts

compass/enrich-google-maps-dataset-with-contacts

Enrich Google Maps Dataset with Contacts. Scrape websites of Google Maps places for contact details and get email addresses, website, location, address, zipcode, phone number, social media links. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

1.7K

3.1

Google Maps Scraper

surigami/google-maps-scraper

πŸ“ Google Maps Scraper Google Maps Scraper lets you extract business data from Google Maps.