VOOZH about

URL: https://www.coursera.org/learn/building-resilient-systems

⇱ Building Resilient Systems | Coursera


Building Resilient Systems

Ends soon! Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Building Resilient Systems

Included with

β€’

Learn more

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

9 hours to complete
Flexible schedule
Learn at your own pace

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

9 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Explain core resilience engineering principles and differentiate between failure types in modern distributed systems.

  • Analyze system architectures to identify single points of failure and resilience gaps that could impact availability.

  • Develop disaster recovery strategies aligned with defined business requirements such as RTO and RPO.

  • Evaluate monitoring, observability, and incident response practices to improve system reliability and operational resilience.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

April 2026

Assessments

4 assignments

Taught in English

There are 4 modules in this course

Building resilient systems requires more than knowing individual toolsβ€”it demands the ability to design architectures that anticipate failure and recover effectively. In this intermediate course, you will learn how to apply resilience engineering principles to modern distributed systems, focusing on high availability, fault tolerance, and disaster recovery planning.

You will analyze how and why systems fail, identify hidden risks in system architecture, and design strategies that improve uptime and reliability. The course connects key concepts such as load balancing, redundancy, observability, and incident response into a cohesive resilience strategy aligned with business goals like RTO and RPO. Designed for IT professionals, DevOps engineers, and system architects, this course emphasizes practical decision-making, trade-offs, and operational readiness. By the end, you will be able to design resilient architectures, strengthen system reliability, and lead effective incident management and continuous improvement practices.

This module introduces the core concepts behind resilient system design. Learners will explore why failures are inevitable, how resilient systems differ from traditional architectures, and the foundational principles used to build systems that can withstand, adapt to, and recover from disruptions. The module sets the mindset and technical baseline required for designing reliable and fault-aware systems.

What's included

11 videos2 readings1 assignment1 peer review1 discussion prompt

11 videosβ€’Total 86 minutes
  • Welcome to Building Resilient Systemsβ€’7 minutes
  • Module Introduction β€’3 minutes
  • Why Systems Fail β€’9 minutes
  • Failure Types and Their Impact β€’10 minutes
  • Learning from Real-World Outages β€’10 minutes
  • Defining Resilience in Modern Systems β€’8 minutes
  • Key Characteristics of Resilient Architectures β€’8 minutes
  • Resilience v/s. Traditional Design Approaches β€’8 minutes
  • Core Principles of Resilience Engineeringβ€’8 minutes
  • Redundancy, Diversity, and Isolation β€’7 minutes
  • Trade-offs in Resilient Design β€’9 minutes
2 readingsβ€’Total 10 minutes
  • Welcome to the Course: Course Overviewβ€’5 minutes
  • Designing Resilient Systems β€’5 minutes
1 assignmentβ€’Total 20 minutes
  • Foundations of Resilient Systemsβ€’20 minutes
1 peer reviewβ€’Total 10 minutes
  • Hands-On-Learning: Identifying Failure Risks in a System Design β€’10 minutes
1 discussion promptβ€’Total 10 minutes
  • Designing for Failure Before It Happensβ€’10 minutes

This module focuses on designing systems that remain available despite failures. Learners will explore high availability concepts, fault tolerance techniques, and architectural patterns used to eliminate single points of failure. The module emphasizes practical design decisions that improve uptime while balancing cost and complexity.

What's included

10 videos1 reading1 assignment1 peer review1 discussion prompt

10 videosβ€’Total 77 minutes
  • Module Introduction β€’3 minutes
  • What High Availability Really Meansβ€’8 minutes
  • Availability Metrics and SLAs β€’7 minutes
  • Eliminating Single Points of Failure β€’13 minutes
  • Active-Active v/s. Active-Passive Designs β€’6 minutes
  • Load Balancing and Traffic Distribution β€’8 minutes
  • Failover Mechanisms and Health Checks β€’7 minutes
  • Designing for Partial Failures β€’8 minutes
  • Graceful Degradation and Backpressure β€’8 minutes
  • Containing Failures with Isolation β€’9 minutes
1 readingβ€’Total 5 minutes
  • High Availability and Fault-Tolerant Architecture β€’5 minutes
1 assignmentβ€’Total 20 minutes
  • High Availability and Fault Tolerance Designβ€’20 minutes
1 peer reviewβ€’Total 10 minutes
  • Hands-On-Learning: Designing a High Availability Architecture β€’10 minutes
1 discussion promptβ€’Total 10 minutes
  • Balancing Availability, Cost, and Complexityβ€’10 minutes

This module focuses on preparing systems and teams to recover from major disruptions. Learners will explore backup and recovery strategies, define recovery objectives, design disaster recovery testing approaches, and create operational runbooks that support consistent and effective recovery. The module emphasizes planning, decision-making, and operational readiness rather than tool-specific implementation.

What's included

10 videos1 reading1 assignment1 peer review1 discussion prompt

10 videosβ€’Total 74 minutes
  • Module Introduction β€’5 minutes
  • Backup Strategies and Recovery Models β€’8 minutes
  • Understanding RTO and RPO β€’8 minutes
  • Designing Backup and Recovery Solutions β€’8 minutes
  • Why Disaster Recovery Testing Matters β€’8 minutes
  • Types of Disaster Recovery Tests β€’6 minutes
  • Developing Disaster Recovery Testing Procedures β€’8 minutes
  • What is an Operational Runbook β€’7 minutes
  • Runbook Structure and Best Practices β€’8 minutes
  • Creating Effective Recovery Runbooks β€’10 minutes
1 readingβ€’Total 5 minutes
  • Disaster Recovery Planning and RTO or RPO Concepts β€’5 minutes
1 assignmentβ€’Total 20 minutes
  • Disaster Recovery Planning and Operational Readinessβ€’20 minutes
1 peer reviewβ€’Total 10 minutes
  • Hands-On-Learning: Creating a Disaster Recovery Planβ€’10 minutes
1 discussion promptβ€’Total 10 minutes
  • Evaluating Recovery Readiness in Real-World Environmentsβ€’10 minutes

This module focuses on maintaining system reliability through effective monitoring, observability, and structured incident management. Learners will explore how logs, metrics, and traces provide system visibility, how alerting strategies support timely response, and how post-incident reviews drive continuous improvement. The module emphasizes operational effectiveness and learning from incidents rather than tool-specific implementation.

What's included

11 videos1 reading1 assignment2 peer reviews1 discussion prompt

11 videosβ€’Total 66 minutes
  • Module Introduction β€’2 minutes
  • Monitoring v/s. Observability β€’6 minutes
  • Observability Pillars: Logs, Metrics, and Traces β€’6 minutes
  • Implementing Comprehensive Observability β€’7 minutes
  • Principles of Effective Alerting β€’6 minutes
  • Alert Thresholds and Escalation Paths β€’6 minutes
  • Designing Effective Alerting Strategies β€’6 minutes
  • Incident Lifecycle and Response Review β€’9 minutes
  • Conducting Productive Post-Incident Reviews β€’7 minutes
  • Driving Continuous Improvement from Incidents β€’6 minutes
  • Course Wrap-Upβ€’5 minutes
1 readingβ€’Total 5 minutes
  • Observability and Incident Management Fundamentals β€’5 minutes
1 assignmentβ€’Total 20 minutes
  • Monitoring, Observability, and Incident Managementβ€’20 minutes
2 peer reviewsβ€’Total 70 minutes
  • Hands-On-Learning: Incident Analysis and Post-Incident Review β€’10 minutes
  • Project: Designing and Defending a Resilient System Architectureβ€’60 minutes
1 discussion promptβ€’Total 10 minutes
  • Designing Observability and Alerting for Real Impactβ€’10 minutes

Instructors

Starweaver
2 Coursesβ€’114 learners

Why people choose Coursera for their career

πŸ‘ Image

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
πŸ‘ Image

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
πŸ‘ Image

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
πŸ‘ Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Financial aid available,