Columnar Storage and Query Optimization
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Explain how data is stored, distinguish row vs columnar storage, and identify performance advantages of columnar formats
Work with Parquet and ORC formats, query columnar data using DuckDB, and compare storage performance against CSV
Read SQL execution plans using EXPLAIN, diagnose bottlenecks, and compare query performance across different engines
Apply optimization techniques including column pruning, filter pushdown, partitioning, and data skipping for analytics
Skills you'll gain
Details to know
April 2026
See how employees at top companies are mastering in-demand skills
There are 5 modules in this course
Every data professional writes SQL queries β but few understand why some queries take seconds and others take minutes on the same data. The answer lies beneath the surface: in how data is stored, how query engines read that data, and how columnar formats like Parquet fundamentally change the game for analytics performance. This course gives you that understanding.
You will start from the foundations β how computers store and read data, how SQL operations access data internally, and what distinguishes row-based storage from columnar storage. From there, you will explore modern columnar formats (Parquet, ORC), work with DuckDB as your primary analytics query engine, and learn to read execution plans to diagnose exactly where queries slow down. Each concept is reinforced through hands-on demonstrations that you can follow along on your own setup. By the end of this course, youβll be able to: - Explain how computers store data, distinguish between row-based and columnar storage, and identify when columnar formats provide a performance advantage. - Work with Parquet and ORC file formats, compare them to CSV, and query columnar data using DuckDB. - Read and interpret SQL query execution plans using EXPLAIN, and diagnose performance bottlenecks in analytical workloads. - Apply real-world query optimization techniques including column pruning, filter pushdown, partitioning, data skipping, and before-vs-after performance comparison. This course is designed for a diverse audience: Data Analysts who want to understand why their queries are slow, junior Data Engineers building foundational storage knowledge, BI Professionals moving into performance engineering or platform roles, and SQL Developers who want to go beyond writing queries to understanding how queries execute internally. Basic computer literacy is helpful. No prior SQL experience is required β though familiarity with basic statements will help you move faster. Stop guessing why queries are slow. Start understanding storage, execution, and optimization β and build the foundational skills every modern data team needs.
This module introduces how data is stored and organized in computer systems using files, tables, rows, and columns. It explains how SQL is used to access and manipulate data and how databases process read operations. The module also compares row-based and column-based storage to show how different storage models affect query performance.
What's included
13 videos6 readings5 assignments
13 videosβ’Total 56 minutes
- Course Introductionβ’4 minutes
- Core Data and Tabular Structure Overviewβ’4 minutes
- Foundations of Data Storage and Organizationβ’6 minutes
- Hands-On: Viewing CSV Data Using Spreadsheet Toolsβ’5 minutes
- Exploring the Purpose and use of SQLβ’5 minutes
- Internal Mechanisms of SQL Data Retrievalβ’4 minutes
- Hands-On: Running Basic SQL Queriesβ’4 minutes
- SQL Commands for Creating, Reading, Updating and Deleting Dataβ’5 minutes
- Comparing Read Heavy and Write Heavy Workloadsβ’5 minutes
- Hands-On : SQL CRUD Operationβ’4 minutes
- Understanding Row Based and Column Based Storageβ’4 minutes
- Hands-On: Evaluating Row and Column Storage Approachesβ’3 minutes
- Hands-On: How Storage Models Affect Query Executionβ’4 minutes
6 readingsβ’Total 57 minutes
- Course Syllabusβ’7 minutes
- Design Principles of Data Organizationβ’10 minutes
- Core SQL Concepts for Data Analysis β’10 minutes
- Managing Data Operations in SQL Systems β’10 minutes
- Row and Column Storage Structuresβ’10 minutes
- Module Summary: Foundations of Data Storage and SQL for Analytics β’10 minutes
5 assignmentsβ’Total 39 minutes
- Foundations of Data Storage and SQL for Analyticsβ’15 minutes
- Data Storage and Retrievalβ’6 minutes
- Introduction to SQL and Data Representationβ’6 minutes
- SQL Data Processing Concepts β’6 minutes
- Row vs. Column Storage Explainedβ’6 minutes
This module explains how columnar storage is used in modern data systems and data warehouses for efficient analytics. It introduces common columnar formats and tools used in industry, and demonstrates how techniques like compression, data skipping, and partitioning improve query performance.
What's included
12 videos4 readings4 assignments
12 videosβ’Total 49 minutes
- Columnar storage in Modern Data Systemsβ’4 minutes
- Columnar Storage Formats: Parquet and ORCβ’4 minutes
- Columnar Systems vs. File Based Storage Architectures β’4 minutes
- Hands-On : Exploring Columnar Parquet Files Using Query Engineβ’3 minutes
- Hands-On : Comparing CSV vs. Columnar File Sizeβ’3 minutes
- Role of Columnar Storage in Modern Data Warehousesβ’4 minutes
- Modern Columnar Systems in Open Source and Cloudβ’5 minutes
- Hands-On: Querying Data in Columnar Storage Systemsβ’4 minutes
- Impact of Compression on Query Performanceβ’5 minutes
- Data Skipping , Metadata, and Partitioning Techniquesβ’4 minutes
- Hands-On: The Role of Filters in Efficient Data Accessβ’4 minutes
- Hands-On: Partitioned vs. Non-Partitioned Queriesβ’5 minutes
4 readingsβ’Total 40 minutes
- Columnar Storage Architecture for Analytical Systemsβ’10 minutes
- Scalable Columnar Data Platforms and Ecosystemsβ’10 minutes
- Performance Optimization in Columnar Data Platformsβ’10 minutes
- Module Summary: Columnar Storage in Modern Industry Systemsβ’10 minutes
4 assignmentsβ’Total 33 minutes
- Columnar Storage in Modern Industry Systemsβ’15 minutes
- Data Storage Formats and Structuresβ’6 minutes
- Columnar Data Processing with DuckDBβ’6 minutes
- Efficiency Gains with Columnar Storageβ’6 minutes
This module introduces query engines and SQL tools used by analysts and engineers to process data. It explains how SQL queries are executed internally and how query plans represent the steps a system takes to run a query. The module also compares different query engines to understand why some systems perform faster for analytical workloads.
What's included
9 videos4 readings4 assignments
9 videosβ’Total 39 minutes
- Role of Query Engineβ’4 minutes
- Hands-On: Running SQL Queries in DuckDBβ’4 minutes
- Execution Lifecycle of SQL Queriesβ’3 minutes
- Query Plansβ’5 minutes
- Hands-On: Analyzing Query Execution Flowβ’5 minutes
- Hands-On: Interpreting Query Execution Plansβ’3 minutes
- Performance Differences Across Query Enginesβ’5 minutes
- Hands-On: Storage Format and Query Performance Analysisβ’4 minutes
- Hands-On: Cross-Engine SQL Query Comparisonβ’5 minutes
4 readingsβ’Total 40 minutes
- Architectural Components of Analytical Query System β’10 minutes
- Query Processing Techniques in Analytical Databasesβ’10 minutes
- System Design Factors in Query Engine Performance β’10 minutes
- Module Summary: Query Engines and SQL Processing Systemsβ’10 minutes
4 assignmentsβ’Total 33 minutes
- Query Engines and SQL Processing Systemsβ’15 minutes
- SQL Execution with in Query Enginesβ’6 minutes
- Query Execution and Query Plansβ’6 minutes
- Analytical Query Systemβ’6 minutes
This module explains why query optimization is important for improving data processing efficiency and reducing slow query performance. It introduces practical SQL optimization techniques such as filtering, column pruning, and efficient aggregations. The module also demonstrates real-world optimization workflows using industry tools to compare query performance before and after optimization.
What's included
9 videos4 readings4 assignments
9 videosβ’Total 36 minutes
- Optimizing SQL Query Performanceβ’5 minutes
- Impact of Query Performance on Business Operationsβ’4 minutes
- Hands-On: Evaluating Query Inefficienciesβ’3 minutes
- Performance Optimization Strategies for Columnar SQL Databasesβ’4 minutes
- Hands-On: Bad SQL vs Optimized SQL Queriesβ’4 minutes
- Hands-On: The Role of Filters in Efficient Data Accesβ’4 minutes
- Query Optimization Pipeline: An End to End Perspectiveβ’3 minutes
- Real World Application of Analytics Optimization Techniquesβ’6 minutes
- Hands-On: Comparing Query Performance Before and After Optimizationβ’3 minutes
4 readingsβ’Total 40 minutes
- Managing Query Performance in Analytic Systemsβ’10 minutes
- Efficient Query Design in Analytical Systemsβ’10 minutes
- Strategies for Sustaining Query Performanceβ’10 minutes
- Module Summary: Query Optimization Concepts and Best Practicesβ’10 minutes
4 assignmentsβ’Total 33 minutes
- Query Optimization Concepts and Best Practicesβ’15 minutes
- Principles of Query Optimization and Its Improtance β’6 minutes
- SQL Query Optimization Strategiesβ’6 minutes
- Real World Query Optimization in Industry Systemβ’6 minutes
This module consolidates key concepts from data storage, SQL querying, query execution, and optimization. It evaluates understanding through structured assessments and practical query analysis scenarios. It serves as a final checkpoint to assess readiness for real-world data querying and performance optimization tasks.
What's included
1 video1 reading2 assignments
1 videoβ’Total 4 minutes
- Course Summaryβ’4 minutes
1 readingβ’Total 30 minutes
- Practice Project: Designing Fast Data Systems for Analyticsβ’30 minutes
2 assignmentsβ’Total 60 minutes
- Final Course Assessment : Columnar Storage and Query Optimizationβ’30 minutes
- Optimizing the Data Core β Columnar Storage and Query Performanceβ’30 minutes
Instructor
Offered by
Why people choose Coursera for their career
Frequently asked questions
Columnar storage organizes data by columns rather than rows. For analytical queries that typically scan a few columns across millions of rows, columnar formats like Parquet and ORC dramatically reduce the amount of data read from disk β enabling faster queries, better compression, and lower compute costs. This course teaches you exactly how and why.
The primary hands-on tool is DuckDB β a fast, in-process analytical query engine that requires minimal setup. You will also work with Parquet files, CSV data, and SQL query plan tools (EXPLAIN). No complex installation or cloud accounts are needed.
Yes. This course is built around a follow-along, demonstration-driven learning model. Each concept is taught through step-by-step video demonstrations using DuckDB and Parquet that you can replicate on your own machine. DuckDB runs locally with minimal setup, so you can pause, rewind, and practice alongside each demo at your own pace.
More questions
Financial aid available,
ΒΉ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
