Columnar Storage and Query Optimization

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Edureka

Columnar Storage and Query Optimization

👁 Edureka

Instructor: Edureka

Included with

•

Learn more

Ask Coursera

5 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

5 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Explain how data is stored, distinguish row vs columnar storage, and identify performance advantages of columnar formats
Work with Parquet and ORC formats, query columnar data using DuckDB, and compare storage performance against CSV
Read SQL execution plans using EXPLAIN, diagnose bottlenecks, and compare query performance across different engines
Apply optimization techniques including column pruning, filter pushdown, partitioning, and data skipping for analytics

Skills you'll gain

Tools you'll learn

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 5 modules in this course

Every data professional writes SQL queries — but few understand why some queries take seconds and others take minutes on the same data. The answer lies beneath the surface: in how data is stored, how query engines read that data, and how columnar formats like Parquet fundamentally change the game for analytics performance. This course gives you that understanding.

You will start from the foundations — how computers store and read data, how SQL operations access data internally, and what distinguishes row-based storage from columnar storage. From there, you will explore modern columnar formats (Parquet, ORC), work with DuckDB as your primary analytics query engine, and learn to read execution plans to diagnose exactly where queries slow down. Each concept is reinforced through hands-on demonstrations that you can follow along on your own setup. By the end of this course, you’ll be able to: - Explain how computers store data, distinguish between row-based and columnar storage, and identify when columnar formats provide a performance advantage. - Work with Parquet and ORC file formats, compare them to CSV, and query columnar data using DuckDB. - Read and interpret SQL query execution plans using EXPLAIN, and diagnose performance bottlenecks in analytical workloads. - Apply real-world query optimization techniques including column pruning, filter pushdown, partitioning, data skipping, and before-vs-after performance comparison. This course is designed for a diverse audience: Data Analysts who want to understand why their queries are slow, junior Data Engineers building foundational storage knowledge, BI Professionals moving into performance engineering or platform roles, and SQL Developers who want to go beyond writing queries to understanding how queries execute internally. Basic computer literacy is helpful. No prior SQL experience is required — though familiarity with basic statements will help you move faster. Stop guessing why queries are slow. Start understanding storage, execution, and optimization — and build the foundational skills every modern data team needs.

This module introduces how data is stored and organized in computer systems using files, tables, rows, and columns. It explains how SQL is used to access and manipulate data and how databases process read operations. The module also compares row-based and column-based storage to show how different storage models affect query performance.

What's included

13 videos6 readings5 assignments

13 videos•Total 56 minutes

Course Introduction•4 minutes
Core Data and Tabular Structure Overview•4 minutes
Foundations of Data Storage and Organization•6 minutes
Hands-On: Viewing CSV Data Using Spreadsheet Tools•5 minutes
Exploring the Purpose and use of SQL•5 minutes
Internal Mechanisms of SQL Data Retrieval•4 minutes
Hands-On: Running Basic SQL Queries•4 minutes
SQL Commands for Creating, Reading, Updating and Deleting Data•5 minutes
Comparing Read Heavy and Write Heavy Workloads•5 minutes
Hands-On : SQL CRUD Operation•4 minutes
Understanding Row Based and Column Based Storage•4 minutes
Hands-On: Evaluating Row and Column Storage Approaches•3 minutes
Hands-On: How Storage Models Affect Query Execution•4 minutes

6 readings•Total 57 minutes

Course Syllabus•7 minutes
Design Principles of Data Organization•10 minutes
Core SQL Concepts for Data Analysis •10 minutes
Managing Data Operations in SQL Systems •10 minutes
Row and Column Storage Structures•10 minutes
Module Summary: Foundations of Data Storage and SQL for Analytics •10 minutes

5 assignments•Total 39 minutes

Foundations of Data Storage and SQL for Analytics•15 minutes
Data Storage and Retrieval•6 minutes
Introduction to SQL and Data Representation•6 minutes
SQL Data Processing Concepts •6 minutes
Row vs. Column Storage Explained•6 minutes

This module explains how columnar storage is used in modern data systems and data warehouses for efficient analytics. It introduces common columnar formats and tools used in industry, and demonstrates how techniques like compression, data skipping, and partitioning improve query performance.

What's included

12 videos4 readings4 assignments

12 videos•Total 49 minutes

Columnar storage in Modern Data Systems•4 minutes
Columnar Storage Formats: Parquet and ORC•4 minutes
Columnar Systems vs. File Based Storage Architectures •4 minutes
Hands-On : Exploring Columnar Parquet Files Using Query Engine•3 minutes
Hands-On : Comparing CSV vs. Columnar File Size•3 minutes
Role of Columnar Storage in Modern Data Warehouses•4 minutes
Modern Columnar Systems in Open Source and Cloud•5 minutes
Hands-On: Querying Data in Columnar Storage Systems•4 minutes
Impact of Compression on Query Performance•5 minutes
Data Skipping , Metadata, and Partitioning Techniques•4 minutes
Hands-On: The Role of Filters in Efficient Data Access•4 minutes
Hands-On: Partitioned vs. Non-Partitioned Queries•5 minutes

4 readings•Total 40 minutes

Columnar Storage Architecture for Analytical Systems•10 minutes
Scalable Columnar Data Platforms and Ecosystems•10 minutes
Performance Optimization in Columnar Data Platforms•10 minutes
Module Summary: Columnar Storage in Modern Industry Systems•10 minutes

4 assignments•Total 33 minutes

Columnar Storage in Modern Industry Systems•15 minutes
Data Storage Formats and Structures•6 minutes
Columnar Data Processing with DuckDB•6 minutes
Efficiency Gains with Columnar Storage•6 minutes

This module introduces query engines and SQL tools used by analysts and engineers to process data. It explains how SQL queries are executed internally and how query plans represent the steps a system takes to run a query. The module also compares different query engines to understand why some systems perform faster for analytical workloads.

What's included

9 videos4 readings4 assignments

9 videos•Total 39 minutes

Role of Query Engine•4 minutes
Hands-On: Running SQL Queries in DuckDB•4 minutes
Execution Lifecycle of SQL Queries•3 minutes
Query Plans•5 minutes
Hands-On: Analyzing Query Execution Flow•5 minutes
Hands-On: Interpreting Query Execution Plans•3 minutes
Performance Differences Across Query Engines•5 minutes
Hands-On: Storage Format and Query Performance Analysis•4 minutes
Hands-On: Cross-Engine SQL Query Comparison•5 minutes

4 readings•Total 40 minutes

Architectural Components of Analytical Query System •10 minutes
Query Processing Techniques in Analytical Databases•10 minutes
System Design Factors in Query Engine Performance •10 minutes
Module Summary: Query Engines and SQL Processing Systems•10 minutes

4 assignments•Total 33 minutes

Query Engines and SQL Processing Systems•15 minutes
SQL Execution with in Query Engines•6 minutes
Query Execution and Query Plans•6 minutes
Analytical Query System•6 minutes

This module explains why query optimization is important for improving data processing efficiency and reducing slow query performance. It introduces practical SQL optimization techniques such as filtering, column pruning, and efficient aggregations. The module also demonstrates real-world optimization workflows using industry tools to compare query performance before and after optimization.

What's included

9 videos4 readings4 assignments

9 videos•Total 36 minutes

Optimizing SQL Query Performance•5 minutes
Impact of Query Performance on Business Operations•4 minutes
Hands-On: Evaluating Query Inefficiencies•3 minutes
Performance Optimization Strategies for Columnar SQL Databases•4 minutes
Hands-On: Bad SQL vs Optimized SQL Queries•4 minutes
Hands-On: The Role of Filters in Efficient Data Acces•4 minutes
Query Optimization Pipeline: An End to End Perspective•3 minutes
Real World Application of Analytics Optimization Techniques•6 minutes
Hands-On: Comparing Query Performance Before and After Optimization•3 minutes

4 readings•Total 40 minutes

Managing Query Performance in Analytic Systems•10 minutes
Efficient Query Design in Analytical Systems•10 minutes
Strategies for Sustaining Query Performance•10 minutes
Module Summary: Query Optimization Concepts and Best Practices•10 minutes

4 assignments•Total 33 minutes

Query Optimization Concepts and Best Practices•15 minutes
Principles of Query Optimization and Its Improtance •6 minutes
SQL Query Optimization Strategies•6 minutes
Real World Query Optimization in Industry System•6 minutes

This module consolidates key concepts from data storage, SQL querying, query execution, and optimization. It evaluates understanding through structured assessments and practical query analysis scenarios. It serves as a final checkpoint to assess readiness for real-world data querying and performance optimization tasks.

What's included

1 video1 reading2 assignments

1 video•Total 4 minutes

Course Summary•4 minutes

1 reading•Total 30 minutes

Practice Project: Designing Fast Data Systems for Analytics•30 minutes

2 assignments•Total 60 minutes

Final Course Assessment : Columnar Storage and Query Optimization•30 minutes
Optimizing the Data Core – Columnar Storage and Query Performance•30 minutes

Instructor

👁 Edureka

Edureka

203 Courses•185,724 learners

Offered by

👁 Image

Edureka

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

Columnar storage organizes data by columns rather than rows. For analytical queries that typically scan a few columns across millions of rows, columnar formats like Parquet and ORC dramatically reduce the amount of data read from disk — enabling faster queries, better compression, and lower compute costs. This course teaches you exactly how and why.

The primary hands-on tool is DuckDB — a fast, in-process analytical query engine that requires minimal setup. You will also work with Parquet files, CSV data, and SQL query plan tools (EXPLAIN). No complex installation or cloud accounts are needed.

Yes. This course is built around a follow-along, demonstration-driven learning model. Each concept is taught through step-by-step video demonstrations using DuckDB and Parquet that you can replicate on your own machine. DuckDB runs locally with minimal setup, so you can pause, rewind, and practice alongside each demo at your own pace.

Basic SQL familiarity (SELECT, FROM, WHERE) is helpful but not strictly required. Module 1 begins with SQL fundamentals — what SQL is, CRUD operations, and how READ operations work internally — before progressing to advanced query optimization. The course is accessible for beginners while still valuable for experienced SQL users.

DuckDB is a modern, open-source analytical query engine designed for fast, in-process SQL analytics. It runs locally without a server, reads Parquet files natively, and provides query plan visibility through EXPLAIN — making it ideal for learning storage and optimization concepts without infrastructure overhead.

A query execution plan shows the step-by-step strategy a query engine uses to retrieve your data — including scan operations, filter application, join methods, and aggregations. In Module 3, you will use EXPLAIN to view execution plans, interpret them conceptually, and diagnose where queries slow down.

Both Parquet and ORC are columnar file formats designed for analytics. Parquet is more widely adopted in cloud and open-source ecosystems (Spark, Hive, DuckDB), while ORC is optimized for Hive-based environments. This course covers both formats and helps you understand when to use each.

You will learn column pruning (reading only needed columns), filter pushdown (applying filters early), partitioning (organizing data for faster scans), data skipping (avoiding irrelevant data blocks), and aggregation optimization. Module 4 includes before-vs-after comparisons so you can measure the real performance impact of each technique.

Data Analysts who want to understand query performance, junior Data Engineers building storage foundations, BI Professionals moving into platform or performance roles, and SQL Developers who want to go beyond syntax to understand execution internals. If slow queries frustrate you, this course is for you.

Storage and query optimization are foundational skills for Data Engineering, Analytics Engineering, BI Development, Database Engineering, and Performance Engineering roles.

Basic computer literacy is sufficient. SQL is taught from fundamentals in Module 1. No prior experience with DuckDB, Parquet, or query optimization is required. Familiarity with basic SELECT statements will help you move faster but is not mandatory.

Yes. Upon completing all graded assessments and the final course assessment, you will earn a Coursera Course Certificate from Edureka that you can add to your LinkedIn profile, resume, or CV.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/columnar-storage-query-optimization