VOOZH about

URL: https://apify.com/web.harvester/sql-on-files-json-csv-and-more

โ‡ฑ SQL Query Runner for CSV, JSON & Parquet Files ยท Apify


Pricing

$3.00/month + usage

Go to Apify Store

SQL on Files (JSON, CSV and more)

Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required. Upload files, provide URLs, or query Apify Datasets directly. Full SQL support: JOINs, aggregations, window functions. Export as JSON, CSV, or Parquet. Lightning-fast analytical queries.

Pricing

$3.00/month + usage

Rating

0.0

(0)

Developer

๐Ÿ‘ Web Harvester

Web Harvester

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

0

Monthly active users

4 months ago

Last modified

Share

SQL on Files

๐Ÿฆ† Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required!

๐Ÿ‘ Apify Actor
๐Ÿ‘ License: MIT

๐ŸŽฏ What This Actor Does

Query any data file with SQL - no database needed:

  • DuckDB Powered - Lightning-fast analytical queries
  • Multi-Format - CSV, JSON, Parquet support
  • Flexible Input - Upload files, URLs, or Apify Datasets
  • Full SQL - JOINs, aggregations, window functions
  • Export Options - JSON, CSV, or Parquet output

๐Ÿš€ Use Cases

Use CaseDescription
Data AnalysisQuery scraped data with SQL
TransformationsClean and reshape data
AggregationsGroup, count, sum, average
FilteringExtract specific records
JoinsCombine multiple datasets
ExportConvert between formats

๐Ÿ“ฅ Input Examples

Simple Query

{
"fileUrl":"https://example.com/data.csv",
"query":"SELECT * FROM data WHERE price > 100 ORDER BY price DESC LIMIT 10"
}

Aggregation

{
"fileUrl":"https://example.com/sales.csv",
"query":"SELECT category, COUNT(*) as count, AVG(price) as avg_price FROM data GROUP BY category"
}

From Apify Dataset

{
"datasetId":"abc123xyz",
"query":"SELECT url, title, price FROM data WHERE price IS NOT NULL"
}

โš™๏ธ Configuration

ParameterTypeDefaultDescription
querystring-Required. SQL query to execute
filestring-Upload a CSV/JSON/Parquet file
fileUrlstring-URL to download file from
datasetIdstring-Load from Apify Dataset
outputFormatstringjsonOutput: json, csv, parquet
limitinteger10000Max rows to return

๐Ÿ“ค Output

JSON Output (Default)

Results pushed to Dataset:

[
{"category":"Electronics","count":1520,"avg_price":299.99},
{"category":"Books","count":892,"avg_price":24.50}
]

CSV/Parquet Output

{
"format":"csv",
"rows":1520,
"columns":3,
"downloadUrl":"https://api.apify.com/v2/..."
}

๐Ÿฆ† SQL Tips

-- Basic filtering
SELECT*FROMdataWHEREcolumnLIKE'%keyword%'
-- Aggregations
SELECT category,COUNT(*),SUM(price)FROMdataGROUPBY category
-- Window functions
SELECT*, ROW_NUMBER()OVER(PARTITIONBY category ORDERBY price DESC)as rank FROMdata
-- Date handling
SELECT*, DATE_TRUNC('month', date_column)asmonthFROMdata
-- JSON extraction
SELECT json_column->>'$.nested.field'asvalueFROMdata
-- Pattern matching
SELECT*FROMdataWHERE name ~'^[A-Z].*'

๐Ÿ’ฐ Cost Estimation

Data SizeApprox. TimeCompute Units
1 MB~5 seconds~0.005
10 MB~15 seconds~0.02
100 MB~1 minute~0.1

๐Ÿ”ง Technical Details

  • Language: Python 3.12
  • Engine: DuckDB 0.10+
  • Memory: 256MB-1GB (scales with data)
  • Speed: 1M+ rows/second for analytics

๐Ÿ“„ License

MIT License - see LICENSE for details.

You might also like

SQL Query

useful-tools/sql-query

Run SQL queries over Apify Platform (currently supports only Datasets).

17

Dataset Query Engine

jiri.spilka/dataset-query-engine

Use natural language queries to retrieve results from an Apify dataset. This Actor provides a query engine that loads a dataset, executes SQL queries, and synthesizes results.

๐Ÿ‘ User avatar

Jiล™รญ Spilka

24

4.6

Auto Insight AI

eager_cornet/sql-explainer

AutoInsight AI is your interactive AI-powered SQL tutor that helps you learn SQL the way professionals master it through guided practice, real examples, instant feedback, and visual execution results.

SmartData Executor

professional_jostle/SmartData-executor

Run structured data processing on CSV or JSON files. Clean, filter, aggregate, and transform datasets using simple parameters. Designed for analysts, automation workflows, and ETL pipelines. Outputs results as Apify Datasets with execution metadata.

Natural Language Dataset Query

apify/natural-language-dataset-query

Use natural language queries to retrieve results from an Apify dataset. This Actor provides a query engine that loads a dataset, executes SQL queries, and synthesizes results. It works as an MCP (Model Context Protocol) server or REST API in Actor standby mode.

Excel to CSV Converter

web.harvester/excel-to-csv

Convert Excel files (XLSX, XLS, ODS) to CSV format. Extract all sheets or specific ones. Configurable delimiter, date formatting, skip empty rows. Batch processing multiple files. Optional JSON output to Dataset. Handles large files efficiently. Perfect for ETL pipelines.

4

Fast Dataset Cleaner & CSV Formatter

motivational_nickel/dataset-cleaner-and-formatter

Fast dataset cleaning for CSV and JSON files. Automatically removes duplicates, trims whitespace, fixes capitalization, and normalizes fields. Works with Apify datasets or uploaded files and prepares data for analytics, CRM imports, and automation pipelines.

๐Ÿ‘ User avatar

Leoncio Jr Coronado

6

AI Data Extraction from PDF

actor4you/ai-data-extraction-from-pdf

Extract text data from PDF files using AI. Upload PDFs directly or provide URLs. Supports text chunking for LLM workflows.

Zip Download Extraction Scraper

fresh_cliff/zip-download-extraction-scraper

Download and extract zip files automatically. Extract archives, process documents, analyze logs, backup files. Batch extract text, JSON, CSV content. Real-time data extraction API.

๐Ÿ‘ User avatar

Brennan Crawford

2

Related articles

AI agent workflow: building an agent to query Apify datasets
Read more