Advanced Data Processing and Analytics with AWS
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Advanced Data Processing and Analytics with AWS
This course is part of Data Engineering on AWS - The Complete Training Specialization
Included with
Learn more
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Master the use of Amazon Kinesis and MSK for real-time data processing.
Set up and manage big data workloads using Amazon EMR efficiently.
Build secure, scalable data lakes using AWS Lake Formation.
Optimize and query large datasets using Amazon Athena.
Skills you'll gain
Details to know
6 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 4 modules in this course
This course features Coursera Coach!
A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course equips learners with the skills to efficiently process and analyze large volumes of data using AWS services. You will gain expertise in streaming data with Amazon Kinesis and Amazon MSK, running big data workloads on Amazon EMR, building data lakes on AWS, and querying data using Amazon Athena. The course is designed to help you develop a deep understanding of AWS tools and best practices for managing data in cloud environments. Through the course, you will explore the fundamentals of streaming data and various AWS services that support real-time analytics, such as Kinesis and MSK. Youβll also dive into building scalable data lakes using AWS Lake Formation and learn how to run big data processing workloads using Amazon EMR, along with optimizing them for cost and performance. Each module builds on the last, allowing you to master streaming, storage, and query operations seamlessly. As you progress, you will learn how to configure and optimize systems for maximum throughput. The course features hands-on exercises and best practices for using AWS tools, ensuring that you develop practical skills for real-world applications. The structure ensures that you understand the foundational concepts before advancing to complex data management and optimization techniques. This course is ideal for data engineers, cloud architects, or anyone looking to advance their skills in AWS data processing. While prior experience with cloud services is helpful, the course is designed for those with an intermediate understanding of data management and analytics. By the end of the course, you will be able to configure AWS services for real-time data processing, set up data lakes, optimize big data workloads on Amazon EMR, and query data efficiently using Amazon Athena.
In this module, we will explore the fundamentals of real-time data streaming and dive deep into AWS services like Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (MSK). You'll learn how to ingest, process, and deliver streaming data using tools such as Kinesis Data Streams, Firehose, and Flink, as well as build scalable Kafka pipelines. By the end, you'll be equipped to choose the right streaming architecture for your analytics and operational needs.
What's included
25 videos2 readings1 assignment
25 videosβ’Total 207 minutes
- What Is Streaming Data?β’8 minutes
- Streaming Services in AWSβ’6 minutes
- Amazon Kinesis Familyβ’3 minutes
- Amazon Kinesis Data Streamsβ’13 minutes
- Capacity Modeβ’7 minutes
- Shard Iteratorsβ’13 minutes
- Kinesis Data Generatorβ’7 minutes
- Data Stream Producersβ’4 minutes
- Data Stream Consumerβ’3 minutes
- Enhanced Fan-Outβ’6 minutes
- Amazon Kinesis Firehoseβ’19 minutes
- Dynamic Partitioningβ’9 minutes
- Data Stream vs. Data Firehoseβ’5 minutes
- Managed Service for Apache Flinkβ’11 minutes
- Flink Applicationβ’15 minutes
- Flink Studioβ’4 minutes
- Apache Kafkaβ’10 minutes
- Amazon Managed Service for Kafkaβ’9 minutes
- MSK Clusterβ’10 minutes
- Kafka Topicβ’22 minutes
- Send and Receive Messagesβ’5 minutes
- Amazon MSK Serverlessβ’5 minutes
- MSK Provisioned vs. Serverlessβ’3 minutes
- Amazon MSK Connectβ’4 minutes
- Amazon Kinesis vs. Amazon MSKβ’6 minutes
2 readingsβ’Total 20 minutes
- Introduction to the Course 'Advanced Data Processing and Analytics with AWS'β’10 minutes
- Full Specialization Resourceβ’10 minutes
1 assignmentβ’Total 15 minutes
- Processing Streaming Data on Amazon Kinesis and Amazon MSK - Assessmentβ’15 minutes
In this module, we will delve into how Amazon EMR simplifies running big data frameworks like Hadoop, Spark, and Hive on AWS. Youβll learn how to configure EMR clusters, manage storage, and leverage EMR Serverless for auto-scaling workloads. The lessons also cover migration strategies and cost optimization techniques for efficient big data processing.
What's included
10 videos1 assignment
10 videosβ’Total 71 minutes
- What Is Big Data?β’4 minutes
- MapReduceβ’5 minutes
- Big Data Ecosystemβ’5 minutes
- Amazon EMRβ’8 minutes
- Storage for EMRβ’7 minutes
- Creating EMR Cluster: Part 1β’16 minutes
- Creating EMR Cluster: Part 2β’9 minutes
- Migrationβ’9 minutes
- Amazon EMR Serverlessβ’4 minutes
- Cost Optimizationβ’4 minutes
1 assignmentβ’Total 15 minutes
- Running Big Data Workloads on Amazon EMR - Assessmentβ’15 minutes
In this module, we will guide you through building and managing a modern data lake on AWS using Lake Formation. You'll set up ingestion, define permissions, and manage metadata for secure, scalable data storage. We also explore the use of open table formats for analytics flexibility and performance.
What's included
9 videos1 assignment
9 videosβ’Total 85 minutes
- What Is a Data Lake?β’9 minutes
- Data Warehouse vs. Data Lakeβ’7 minutes
- AWS Lake Formationβ’9 minutes
- How It Works?β’10 minutes
- Setting Up a Data Lake: Part 1β’18 minutes
- Setting Up a Data Lake: Part 2β’7 minutes
- Data Lake Permissionsβ’13 minutes
- Tag-Based Permissionsβ’9 minutes
- Open Table Formatsβ’3 minutes
1 assignmentβ’Total 15 minutes
- Building Data Lakes on AWS - Assessmentβ’15 minutes
In this module, we will explore how Amazon Athena enables serverless, SQL-based querying of your data stored in Amazon S3. Youβll learn to optimize queries, manage access with workgroups, and extend Athenaβs capabilities through federated queries. By mastering these techniques, you'll streamline data analysis without managing infrastructure.
What's included
7 videos1 reading3 assignments
7 videosβ’Total 88 minutes
- Why Use Amazon Athena?β’7 minutes
- How It Works?β’14 minutes
- Optimizing Queries in Athena: Part 1β’16 minutes
- Optimizing Queries in Athena: Part 2β’13 minutes
- Workgroupsβ’14 minutes
- Federated Query: Part 1β’5 minutes
- Federated Query: Part 2β’20 minutes
1 readingβ’Total 10 minutes
- Conclusion to the Course 'Advanced Data Processing and Analytics with AWS'β’10 minutes
3 assignmentsβ’Total 90 minutes
- Query Your Data Using Amazon Athena - Assessmentβ’15 minutes
- Full Course Assessmentβ’60 minutes
- Full Course Practice Assessmentβ’15 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Data Analysis
- Status: Free Trial
Course
- Status: Free Trial
Course
- Status: Free Trial
Course
- Status: Free Trial
Why people choose Coursera for their career
Frequently asked questions
Advanced Data Processing and Analytics with AWS is a comprehensive course designed to equip learners with the knowledge and skills needed to process large volumes of data using AWS services. The course covers essential topics such as streaming data, big data workloads, data lakes, and serverless analytics. With the increasing importance of real-time data processing, machine learning, and large-scale data management, the course provides invaluable expertise in using AWS tools and frameworks to process, analyze, and derive insights from data effectively. This is highly relevant for those looking to advance in fields like cloud computing, big data, and analytics.
This course focuses on advanced techniques for processing and analyzing data on AWS. It covers four key modules: processing streaming data using Amazon Kinesis and Amazon MSK, running big data workloads on Amazon EMR, building data lakes with AWS Lake Formation, and querying data using Amazon Athena. Each module includes hands-on examples that demonstrate how to work with AWS services like Kinesis, EMR, Lake Formation, and Athena to efficiently handle large datasets, perform real-time analytics, and optimize query performance.
After completing the course, you will be able to design and implement solutions for processing streaming data using AWS services, manage big data workloads with Amazon EMR, build and manage data lakes on AWS, and optimize data queries with Amazon Athena. You will have the skills to select and use appropriate AWS services to handle different types of data processing tasks, from real-time analytics to large-scale data management and querying, empowering you to tackle complex data challenges in a cloud environment.
More questions
Financial aid available,
