Pricing
$3.00/month + usage
Go to Apify Store
SQL on Files (JSON, CSV and more)
Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required. Upload files, provide URLs, or query Apify Datasets directly. Full SQL support: JOINs, aggregations, window functions. Export as JSON, CSV, or Parquet. Lightning-fast analytical queries.
Pricing
$3.00/month + usage
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
2
Total users
0
Monthly active users
4 months ago
Last modified
Categories
Share
SQL on Files
๐ฆ Run SQL queries on CSV, JSON, and Parquet files using DuckDB. No database setup required!
๐ Apify Actor
๐ License: MIT
๐ฏ What This Actor Does
Query any data file with SQL - no database needed:
- DuckDB Powered - Lightning-fast analytical queries
- Multi-Format - CSV, JSON, Parquet support
- Flexible Input - Upload files, URLs, or Apify Datasets
- Full SQL - JOINs, aggregations, window functions
- Export Options - JSON, CSV, or Parquet output
๐ Use Cases
| Use Case | Description |
|---|---|
| Data Analysis | Query scraped data with SQL |
| Transformations | Clean and reshape data |
| Aggregations | Group, count, sum, average |
| Filtering | Extract specific records |
| Joins | Combine multiple datasets |
| Export | Convert between formats |
๐ฅ Input Examples
Simple Query
{"fileUrl":"https://example.com/data.csv","query":"SELECT * FROM data WHERE price > 100 ORDER BY price DESC LIMIT 10"}
Aggregation
{"fileUrl":"https://example.com/sales.csv","query":"SELECT category, COUNT(*) as count, AVG(price) as avg_price FROM data GROUP BY category"}
From Apify Dataset
{"datasetId":"abc123xyz","query":"SELECT url, title, price FROM data WHERE price IS NOT NULL"}
โ๏ธ Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | - | Required. SQL query to execute |
file | string | - | Upload a CSV/JSON/Parquet file |
fileUrl | string | - | URL to download file from |
datasetId | string | - | Load from Apify Dataset |
outputFormat | string | json | Output: json, csv, parquet |
limit | integer | 10000 | Max rows to return |
๐ค Output
JSON Output (Default)
Results pushed to Dataset:
[{"category":"Electronics","count":1520,"avg_price":299.99},{"category":"Books","count":892,"avg_price":24.50}]
CSV/Parquet Output
{"format":"csv","rows":1520,"columns":3,"downloadUrl":"https://api.apify.com/v2/..."}
๐ฆ SQL Tips
-- Basic filteringSELECT*FROMdataWHEREcolumnLIKE'%keyword%'-- AggregationsSELECT category,COUNT(*),SUM(price)FROMdataGROUPBY category-- Window functionsSELECT*, ROW_NUMBER()OVER(PARTITIONBY category ORDERBY price DESC)as rank FROMdata-- Date handlingSELECT*, DATE_TRUNC('month', date_column)asmonthFROMdata-- JSON extractionSELECT json_column->>'$.nested.field'asvalueFROMdata-- Pattern matchingSELECT*FROMdataWHERE name ~'^[A-Z].*'
๐ฐ Cost Estimation
| Data Size | Approx. Time | Compute Units |
|---|---|---|
| 1 MB | ~5 seconds | ~0.005 |
| 10 MB | ~15 seconds | ~0.02 |
| 100 MB | ~1 minute | ~0.1 |
๐ง Technical Details
- Language: Python 3.12
- Engine: DuckDB 0.10+
- Memory: 256MB-1GB (scales with data)
- Speed: 1M+ rows/second for analytics
๐ License
MIT License - see LICENSE for details.
