Data Engineering Capstone Project
Ends soon! Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Data Engineering Capstone Project
This course is part of IBM Data Engineering Professional Certificate
Instructor: Rav Ahuja
20,471 already enrolled
Included with
Learn more
What you'll learn
Demonstrate proficiency in skills required for an entry-level data engineering role.
Design and implement various concepts and components in the data engineering lifecycle such as data repositories.
Showcase working knowledge with relational databases, NoSQL data stores, big data engines, data warehouses, and data pipelines.
Apply skills in Linux shell scripting, SQL, and Python programming languages to Data Engineering problems.
Skills you'll gain
Details to know
See how employees at top companies are mastering in-demand skills
Build your Data Management expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate from IBM
There are 7 modules in this course
Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate.
You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform. In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. Youβll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations. You will generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. You will also show your proficiency in Extract, Transform, and Load (ETL) processes by creating data pipelines for moving data from different repositories. You will perform big data analytics using Apache Spark to make predictions with the help of a machine learning model. This course is the final course in the IBM Data Engineering Professional Certificate. It is recommended that you complete all the previous courses in this Professional Certificate before starting this course.
In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.
What's included
1 video2 assignments1 app item4 plugins
1 videoβ’Total 4 minutes
- Introduction to Capstone Project β’4 minutes
2 assignmentsβ’Total 36 minutes
- Checklist: OLTP Databaseβ’24 minutes
- Graded Quiz: OLTP Databaseβ’12 minutes
1 app itemβ’Total 30 minutes
- Lab: OLTP Databaseβ’30 minutes
4 pluginsβ’Total 45 minutes
- Reading: Final Project Submission Guidelines and Deliverablesβ’15 minutes
- Data Platform Architectureβ’10 minutes
- Assignment Overview: OLTP Databaseβ’15 minutes
- OLTP Database Requirements and Designβ’5 minutes
In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.
What's included
2 assignments1 app item1 plugin
2 assignmentsβ’Total 25 minutes
- Checklist: Querying Data in NoSQL Databasesβ’10 minutes
- Graded Quiz: Querying Data in NoSQL Databasesβ’15 minutes
1 app itemβ’Total 30 minutes
- Hands-on Lab: Querying Data in NoSQL Databasesβ’30 minutes
1 pluginβ’Total 15 minutes
- Assignment Overview: Querying Data in NoSQL Databasesβ’15 minutes
In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.
What's included
3 assignments2 app items1 plugin
3 assignmentsβ’Total 69 minutes
- Checklist: Data Warehouse Design & Setupβ’15 minutes
- Checklist: Data Warehouse Reportingβ’24 minutes
- Graded Quiz: Build a Data Warehouseβ’30 minutes
2 app itemsβ’Total 120 minutes
- Hands-on Lab: Data Warehousingβ’60 minutes
- Hands-on Lab: Data Warehouse Reporting using PostgreSQLβ’60 minutes
1 pluginβ’Total 15 minutes
- Assignment Overview:Data Warehouse Design and Reportingβ’15 minutes
In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.
What's included
5 readings2 assignments6 plugins
5 readingsβ’Total 42 minutes
- (Optional): About this optional lesson with Looker Studioβ’2 minutes
- (Optional) : Getting Started with Google Looker Studioβ’10 minutes
- (Optional): Creating Visualizations in Reports using Looker Studioβ’10 minutes
- (Optional) : Summary and Highlightsβ’10 minutes
- Final Assignment Overviewβ’10 minutes
2 assignmentsβ’Total 27 minutes
- Checklist: Dashboard Creation β’12 minutes
- Graded Quiz: Dashboard Creation β’15 minutes
6 pluginsβ’Total 210 minutes
- Assignment Overview: Data Analyticsβ’15 minutes
- (Optional):Hands-on Lab: Getting Started with Google Looker Studioβ’60 minutes
- (Optional): Hands-on Lab: Creating and Configuring Visualizations in Reports with Google Looker Studioβ’60 minutes
- (Optional) Hands-on Lab: Advanced charts in Looker Studioβ’15 minutes
- (Optional): Final Assignment : Dashboard Creation using IBM Cognos Analyticsβ’30 minutes
- (Optional): Final Assignment : Dashboard Creation using Google Looker Studio β’30 minutes
In this module, you will perform ETL operations to move transactional data from an OLTP database (MySQL) into a data warehouse (PostgreSQL).Finally, you will implement and automate an ETL pipeline in Python that extracts daily incremental records from the production database, transforms them as needed, and loads them into the warehouse. Once the ETL process is established, you will extend it further using Apache Airflow, a powerful workflow orchestration tool. You will design DAGs (Directed Acyclic Graphs) that define task dependencies, automate the extraction and transformation of web server logs, and archive processed data for downstream analytics.
What's included
3 assignments2 app items1 plugin
3 assignmentsβ’Total 66 minutes
- Checklist: ETL β’9 minutes
- Checklist: Data Pipelines using Apache Airflowβ’27 minutes
- Graded Quiz: ETL and Data Pipelinesβ’30 minutes
2 app itemsβ’Total 90 minutes
- Hands-on Lab: ETLβ’60 minutes
- Hands-on Lab: Data Pipelines using Apache Airflowβ’30 minutes
1 pluginβ’Total 15 minutes
- Assignment Overview: ETL and Data Pipelinesβ’15 minutes
In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.
What's included
2 assignments2 app items1 plugin
2 assignmentsβ’Total 29 minutes
- Checklist: Big Data Analytics with Sparkβ’14 minutes
- Graded Quiz: Big Data Analytics with Sparkβ’15 minutes
2 app itemsβ’Total 60 minutes
- Practice Hands On Lab: Saving and loading a SparkML modelβ’30 minutes
- Hands-on Lab: SparkML Opsβ’30 minutes
1 pluginβ’Total 15 minutes
- Assignment Overview: Big Data Analytics with Sparkβ’15 minutes
In this module, you will make a final submission of all the labs youβve completed throughout the course for evaluation.You can choose to have your submission evaluated by an AI tool or through a peer-graded review.
What's included
2 readings1 peer review1 app item
2 readingsβ’Total 3 minutes
- Congrats & Next Stepsβ’2 minutes
- Thanks from the Course Teamβ’1 minute
1 peer reviewβ’Total 60 minutes
- Option 2 - Peer Graded: Final Project - Submission and Evaluationβ’60 minutes
1 app itemβ’Total 60 minutes
- Option 1 - AI Graded: Final Project-Submission and Evaluationβ’60 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Data Management
- Status: Free Trial
Course
- Status: Free TrialS
SkillUp
Course
- Status: Free Trial
Course
- Status: Free Trial
Course
Why people choose Coursera for their career
Learner reviews
- 5 stars
84.61%
- 4 stars
9.79%
- 3 stars
2.09%
- 2 stars
1.39%
- 1 star
2.09%
Showing 3 of 143
Reviewed on Mar 9, 2024
The Capstone was a bit of an anticlimax. I was expecting a very challenging Capstone, but found a "follow the instructions" approach which made it seem too simple. I'm not complaining ;-)
Reviewed on Mar 17, 2023
I enjoyed having to go back and revise the other courses in the specialization. I had forgotten how interesting they were.
Reviewed on Aug 13, 2023
Great course to learn the fundamentals to become a very good Data Engineer !
Frequently asked questions
This project requires you to architect a multi-tiered data platform utilizing various database paradigms. For transactional data, you will implement a MySQL OLTP database to log live e-commerce operations. For unstructured data storage, you will design a MongoDB NoSQLdatabase to manage product catalogs. Finally, you will construct and populate a data warehouse using PostgreSQL and IBM Db2, writing complex analytics queries for business reporting.
You will gain hands-on experience handling automated data movement across different platform layers. You will build an Extract, Transform, and Load (ETL) pipeline using Python to extract daily incremental records from production systems and load them safely into your data warehouse. Moving beyond basic scripts, you will orchestrate this entire pipeline using Apache Airflow, designing DAGs (Directed Acyclic Graphs) to manage task dependencies, automate the ingestion of web server logs, and clean data for downstream analytics.
Yes. To prepare you for entry-level engineering roles, the capstone integrates big data processing engines and BI platforms. You will use Apache Spark to run large-scale log analysis, extracting and parsing search terms directly from web server data. Furthermore, you will hook your engineered data paths into a pretrained machine learning model to execute sales forecasting and then pipe those results into IBM Cognos Analytics to build interactive, live reporting dashboards reflecting key performance indicators.
More questions
Financial aid available,
ΒΉ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
