Building Automated Data Pipelines with Spark,dbt,and Airflow
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Building Automated Data Pipelines with Spark,dbt,and Airflow
This course is part of Open source Data Engineering with Spark, dbt & Airflow Professional Certificate
Included with
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.
Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.
Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.
Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.
Skills you'll gain
Tools you'll learn
Details to know
March 2026
See how employees at top companies are mastering in-demand skills
Build your Data Analysis expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate from Coursera
There are 11 modules in this course
You'll master the art of building production-ready data pipelines that automatically process millions of records. In this hands-on course, you'll design end-to-end workflows that integrate diverse data sources—from databases and APIs to real-time streams—using industry-standard tools like Apache Spark, dbt, and Apache Airflow. You'll learn to create robust data models that preserve historical changes, implement performance optimizations that reduce processing time by 30% or more, and build automated workflows with intelligent retry logic and monitoring alerts.
By the end, you'll have created a complete data pipeline system that demonstrates the technical skills data engineering teams need most. You'll know how to unify fragmented data sources, apply advanced transformation techniques, and ensure your pipelines run reliably at scale. This practical experience directly translates to the challenges you'll face as a data engineer, data analyst, or anyone working with large-scale data systems in modern organizations.
You will learn the foundational concepts and tools needed to create systematic visual documentation of data pipeline architectures.
What's included
3 videos2 readings1 assignment
3 videos•Total 15 minutes
- Why Data Flow Visualization Drives Engineering Success•4 minutes
- Systematic Approach to Identifying Sources and Destinations•8 minutes
- Creating Your First Data Flow Diagram•3 minutes
2 readings•Total 11 minutes
- Essential Components of Professional Data Flow Diagrams•6 minutes
- Transformation Mapping Principles for Complex Data Pipelines•5 minutes
1 assignment•Total 3 minutes
- Data Flow Fundamentals Knowledge Check•3 minutes
You will apply advanced techniques to create professional-quality data flow diagrams that accurately represent complex enterprise data systems and support stakeholder collaboration.
What's included
2 videos2 readings3 assignments
2 videos•Total 12 minutes
- Advanced Diagramming Techniques for Complex Data Systems•9 minutes
- Mapping Complex Multi-System Data Pipelines•3 minutes
2 readings•Total 13 minutes
- Enterprise Data Flow Best Practices and Industry Standards•7 minutes
- Validation and Review Processes for Data Flow Documentation•6 minutes
3 assignments•Total 25 minutes
- Comprehensive Data Flow Mastery Assessment•10 minutes
- Create Complete Enterprise Data Flow Diagram•12 minutes
- Advanced Data Flow Concepts Knowledge Check•3 minutes
You will establish the foundational understanding and core skills for creating modular data pipeline stages, focusing on the principles of separation of concerns and tool integration fundamentals.
What's included
1 video1 reading1 assignment
1 video•Total 7 minutes
- Open Source Tool Ecosystem: Spark, dbt, and Airflow Integration•7 minutes
1 reading•Total 12 minutes
- Fundamentals of Modular Data Pipeline Architecture•12 minutes
1 assignment•Total 3 minutes
- Modular Pipeline Design Fundamentals Assessment•3 minutes
You will implement complete end-to-end data pipelines by integrating modular components with industry-standard tools, culminating in comprehensive assessment of their pipeline development capabilities.
What's included
2 readings3 assignments
2 readings•Total 20 minutes
- End-to-End Pipeline Integration Patterns•12 minutes
- Implementing Complete Pipeline Integration with Spark, dbt, and Airflow•8 minutes
3 assignments•Total 38 minutes
- Comprehensive Modular Pipeline Development Assessment•15 minutes
- End-to-End Pipeline Development Project•20 minutes
- Modular Pipeline Integration and Coordination Quiz•3 minutes
You will establish foundational knowledge of connector architecture and complete their first database connector configuration using Airbyte.
What's included
2 videos2 readings1 assignment
2 videos•Total 10 minutes
- Why Data Source Unification Matters for Enterprise Success•4 minutes
- Airbyte Connector Fundamentals - Your Integration Foundation•6 minutes
2 readings•Total 17 minutes
- Understanding Connector Architecture and Integration Patterns•8 minutes
- Professional Guide: Configuring Your First Database Connector Step-by-Step•9 minutes
1 assignment•Total 3 minutes
- Connector Configuration Foundation Knowledge Check •3 minutes
You will implement complete multi-source data integration by configuring streaming and API connectors, applying enterprise security patterns, and demonstrating mastery through comprehensive connector configuration.
What's included
2 videos2 readings2 assignments
2 videos•Total 10 minutes
- Enterprise Integration Success Stories - Why Multi-Source Unity Matters •4 minutes
- Streaming and API Connector Configuration Mastery•6 minutes
2 readings•Total 15 minutes
- Authentication and Security Patterns for Production Connectors •7 minutes
- Multi-Source Data Integration: A How-To Guide•8 minutes
2 assignments•Total 18 minutes
- Connector Configuration Mastery Assessment •15 minutes
- Multi-Source Integration Configuration Check•3 minutes
You will understand the fundamental concepts of SCD2 logic and begin applying these principles to create data models that preserve historical context in enterprise data warehouses.
What's included
3 videos1 reading1 assignment
3 videos•Total 14 minutes
- Why SCD2 Matters in Enterprise Data Warehouses•4 minutes
- Understanding SCD2 Core Components and Business Logic •7 minutes
- Building Your First SCD2 Table Structure in SQL•4 minutes
1 reading•Total 8 minutes
- SCD2 Implementation Patterns and Data Model Design•8 minutes
1 assignment•Total 3 minutes
- SCD2 Fundamentals Knowledge Check•3 minutes
You will implement production-ready SCD2 models using dbt, creating automated historical tracking systems with proper change detection, validity periods, and current status management.
What's included
2 videos2 readings3 assignments
2 videos•Total 13 minutes
- dbt Snapshots for Automated SCD2 Change Detection•8 minutes
- Building Complete dbt SCD2 Model with Validity Periods•5 minutes
2 readings•Total 18 minutes
- Why dbt Transforms SCD2 Implementation for Data Teams•8 minutes
- dbt SCD2 Implementation Patterns and Production Considerations •10 minutes
3 assignments•Total 36 minutes
- SCD2 Implementation Mastery Assessment •15 minutes
- Build Production SCD2 Data Model for Product Dimensions •18 minutes
- dbt SCD2 Implementation Knowledge Check •3 minutes
You will understand the foundational concepts and design principles for creating robust data workflows with Apache Airflow.
What's included
3 videos1 reading1 assignment
3 videos•Total 15 minutes
- The Cost of Fragile Data Pipelines•2 minutes
- Apache Airflow Fundamentals for Production Workflows•6 minutes
- Building Your First Production-Ready DAG Structure•7 minutes
1 reading•Total 10 minutes
- Design Principles for Robust Data Workflows•10 minutes
1 assignment•Total 3 minutes
- Workflow Design Principles Assessment•3 minutes
You will implement production-grade Airflow workflows with retry mechanisms, SLA monitoring, and parameterization for enterprise-ready data pipeline resilience.
What's included
2 videos1 reading2 assignments1 ungraded lab
2 videos•Total 12 minutes
- When Production Workflows Save Business Operations•3 minutes
- Implementing Advanced Production Patterns in Airflow•9 minutes
1 reading•Total 10 minutes
- Production Implementation Patterns and Best Practices•10 minutes
2 assignments•Total 13 minutes
- Production Workflow Mastery Assessment•10 minutes
- Production Implementation Patterns Assessment•3 minutes
1 ungraded lab•Total 20 minutes
- Building Production-Ready Airflow DAGs with Retry Logic and SLA Monitoring•20 minutes
You will integrate data engineering skills to build a complete automated data pipeline that processes diverse data sources, applies historical tracking, and orchestrates workflows. This project synthesizes mapping, transformation, integration, modeling, and automation capabilities into a production-ready data system.
What's included
4 readings1 assignment
4 readings•Total 90 minutes
- Why This Project Matters•10 minutes
- Project Requirements•10 minutes
- Assignment: Data Pipeline Automation System•60 minutes
- Solution Key•10 minutes
1 assignment•Total 15 minutes
- Graded Quiz: Building Automated Data Pipelines with Spark, dbt, and Airflow•15 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Explore more from Data Analysis
Course
Course
Course
Why people choose Coursera for their career
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
More questions
Financial aid available,
¹ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
