Note

Access to this page requires authorization. You can try signing in or .

Access to this page requires authorization. You can try .

Enable zone resiliency for Azure workloads

To make your applications more resilient to zone-related hardware failures, network disruptions, and natural disasters, it's important that design your Azure workloads for zone resiliency. When you distribute resources across multiple availability zones within a region, you reduce the risk of a single zone outage affecting critical services.

Although it's a best practice to address zone resiliency during the initial planning and deployment of workloads, it's common to want to convert existing non-resilient workloads to zone resilient configurations. In general, the processing of enabling zone resiliency for existing workloads is straightforward, and Microsoft continues to simplify the process. However, any change to your workload can introduce an amount of risk. Once you understand the risks that are involved, you'll then be able to assess and prioritize which workloads and services within those workloads are most vital to your business, then apply zone resiliency to the most impactful resources first.

This article outlines key considerations for enabling zone resiliency in your Azure workloads. It also helps you plan and implement a successful transition to a more resilient architecture.

Tip

If you're currently in the process of designing your workloads or plan to do a design review of your current workloads, it's important that you follow the recommendations for designing for redundancy in the Azure Well-Architected Framework (WAF). The WAF recommendations guide helps you design workload redundancy across multiple levels, with a focus on critical workflows. To support availability zone adoption, it also outlines strategies like multi-region deployments and deployment stamps.

What is zone resiliency?

Azure services can be made resilient to availability zone outages in two primary ways:

  • Zone-redundant services: Many Azure services support zone redundancy. These services automatically replicate data between availability zones, distribute incoming requests, and fail over to different zones during a zone failure. Each service supports these capabilities in a way that makes sense for that specific service. Some services are zone redundant by default, while other services might need you to configure zone redundancy.

  • Zonal services: Some Azure services are zonal, which means that they can be pinned to a specific availability zone. To achieve zone resiliency by using a zonal service, deploy separate instances of the service in multiple availability zones. You might also need to manage traffic distribution, replication of data, and failover between the instances.

Some services can be deployed in either a zone-redundant or zonal configuration. For most cases, it's best to deploy zone-redundant services when you can.

For more information, see Types of availability zone support.

Zone enablement procedure

Use the following steps to systematically review your Azure workloads, prioritize them for zone resiliency, and enable zone resiliency for each component.

Prerequisites

Before you begin, perform the following actions:

  • Identify each workload. A workload refers to a collection of application resources, data, and supporting infrastructure that function together to achieve defined business outcomes. For more information about workloads and how to define them, see Well-Architected Framework workloads.

  • Prioritize each workload's user and system flows. Understand the critical paths and dependencies of your workloads to determine which components to make zone resilient first. For more information about how to use critical flow analysis to prioritize workflows, see Prioritize workloads for zone resiliency.

  • Assign a criticality rating to each workload and flow. This rating helps you understand the impact of a potential outage on your business and guides your decisions about which workloads to prioritize for zone resiliency. Also consider the amount of acceptable downtime while you reconfigure the workloads.

    You can use a taxonomy to classify your workloads based on their criticality. This approach helps you focus your efforts on the most important services.

    Consider the following example taxonomy to classify your workloads.

    Workload type Description Effect of disruption
    Mission-critical Critical flows and workloads that must be highly reliable, always available, resilient to failures, and operational Any disruption to essential functions immediately risks catastrophic business damage or introduces risks to human life.
    Business-critical Essential flows and workloads that operate important business functions Disruption risks some financial loss or brand damage.
    Business‑operational Contributes to efficiency of business operations, but out of direct line-of-service to customers Can tolerate some level of disruption.
    Administrative Internal production flows and workloads not aligned to business operations Can tolerate disruption.

    For more information about how to classify your workloads according to criticality rating, see Assign a criticality rating to each flow.

  • Verify that the regions where your Azure resources reside support availability zones. Consult the Azure regions list. If a region doesn't support availability zones, consider relocating your resources to a region that does. For more information, see Move Azure resources across resource groups, subscriptions, or regions.

Step 1: Prioritize Azure services for zone resilience

After you determine which workload flows are most critical to your business, you can focus on the Azure services that those flows depend on. Some Azure services are more critical to your applications than others. Prioritize these services to help ensure that your applications remain available and resilient if a zone failure occurs.

Use the following guidance to prioritize Azure service groups based on their criticality to your workloads. Consider your specific application architecture and business requirements when you determine the priority of services for zone resiliency.

  1. Start with networking services. Workloads tend to share networking services, so an increase in the resiliency of these services can improve the resiliency of multiple workloads at once.

    Many core networking services are zone redundant automatically, but you should focus on components like Azure ExpressRoute gateways, Azure VPN Gateway, Azure Application Gateway, Azure Load Balancer, and Azure Firewall.

  2. Improve operational data storage resiliency. Operational data stores contain valuable data that multiple workloads often use, so improving the availability of those data stores can help many workloads.

    For operational data storage resiliency, focus on services like Azure SQL Database, Azure SQL Managed Instance, Azure Storage, Azure Data Lake Storage, Azure Cosmos DB, Azure Database for PostgreSQL, Azure Database for MySQL, and Azure Managed Redis.

  3. Prioritize compute services. These services are often easy to replicate and distribute among zones because they're stateless.

    Compute services include Azure Virtual Machines, Azure Virtual Machine Scale Sets, Azure Kubernetes Service (AKS), Azure App Service, App Service Environment, Azure Functions, Azure Service Fabric, and Azure Container Apps.

  4. Review remaining business-critical resources that your critical flows use. These resources might not be as critical as the resources listed previously, but they still play a role in your application's functionality, and you should consider them for zone resiliency.

  5. Review the rest of your business-operational resources. Make informed decisions about whether to make them zone resilient. This review includes services that might not directly relate to your critical workloads but still contribute to overall application performance and reliability.

Step 2: Assess zone configuration approaches

After you prioritize your workloads and Azure services, identify the approach required to enable availability zone support for each service, and understand what you need to do to configure zone resiliency.

Each Azure reliability service guide provides a section that describes how to enable zone resiliency for that service. This section helps you understand the effort required to make each service zone resilient so that you can plan your strategy accordingly. For more information about a specific service, see Azure reliability service guides.

Use the zone configuration table to quickly understand approaches for common Azure services.

Important

If your workload includes components deployed in a zonal (or single-zone) configuration, plan to make these components resilient to zone outages. A common approach is to deploy separate instances into another availability zone and switch between them if necessary.

Step 3: Test for latency

When you make workloads zone resilient, consider latency between availability zones. Occasionally, some legacy systems can't tolerate the small amount of extra latency that cross-zone traffic introduces, especially when you enable synchronous replication within the data tier. If you suspect that cross-zone latency might affect your workload, run tests before and after you enable zone resiliency. For more information about how cross-zone latency might affect your application and approaches to mitigate cross-zone latency problems, see Zonal resources and zone resiliency.

Zone configuration approaches for Azure services

Each Azure service supports a specific type of availability zone support, which is based on the service's intended use and internal architecture. If you have a resource that isn't configured to use availability zones (or a nonzonal resource), you might want to reconfigure it with availability zone support. The reliability guide for that service provides guidance or links to availability zone configuration instructions.

This section provides an overview of the different types of zone configuration approaches and which approach each service supports.

Important

When you enable zone redundancy on a resource, that resource becomes automatically resilient to zone failures. When you use a zonal configuration to pin the resource to a specific availability zone, the resource isn't automatically zone redundant. You must make it resilient to a zone failure. For zonal services, this article reflects the complexity and cost of pinning to a zone. For more information about extra steps to achieve zone resiliency, see the service's reliability guide.

The zone configuration table lists the supported zone configuration approach for many Azure services and contains a link to each reliability guide for that service. The reliability guide provides information about how to configure nonzonal service resources to enable availability zone support.

The following table describes common zone configuration approaches, including the level of effort and downtime required to enable availability zones. Some services have different approaches listed because of the way they work.

Approach Description Typical level of effort Might require downtime
Always zone resilient The service is zone resilient by default in regions that support availability zones. No action is required. None No
Enablement Minimal configuration changes required, such as enabling zone redundancy in settings. The process doesn't affect availability, but consider effects on cost or performance. Low No
Modification Likely requires some configuration changes, such as redeploying dependent resources or modifying network settings. Medium Yes
Redeployment Significant changes required, such as redeploying entire resources, applications, or services, or migrating data to new services. High Yes

Understand the cost of enabling availability zone support for a service. For many services, enabling availability zones doesn't add cost. But some services require a specific tier, a specific number of capacity units, or both. Other services charge different rates when you use availability zones. The table in the next section lists the typical cost impact for each service.

Note

The information in this article summarizes the typical approach to enable availability zone support and outlines the typical cost impact. But some factors might affect how it works for your specific solution. For example, some services are listed as always zone resilient, but this designation only applies in specific regions or for specific tiers of the service. Use these tables as a starting point, but review the other resources mentioned to understand the specific details.

Azure services by zone configuration approach

The following table summarizes the availability zone support for many Azure services and provides an approach, including cost impact, to enable availability zone support for each service.

Service Can be zone redundant Can be zonal Typical zone configuration approach Typical cost impact
πŸ‘ Azure AI Search
Azure AI Search
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure API Management
Azure API Management
πŸ‘ Yes
πŸ‘ Yes
Modification Minimum tier required
πŸ‘ Azure App Configuration
Azure App Configuration
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure App Service
Azure App Service
πŸ‘ Yes
Enablement Minimum tier and instance count required
πŸ‘ Azure App Service: App Service Environment
Azure App Service - App Service Environment
πŸ‘ Yes
Enablement Minimum instance count required
πŸ‘ Azure Application Gateway v2
Azure Application Gateway
πŸ‘ Yes
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Automation
Azure Automation
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Backup
Azure Backup
πŸ‘ Yes
Redeployment Moderate cost increase
πŸ‘ Azure Bastion
Azure Bastion
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact
πŸ‘ Azure Batch
Azure Batch
πŸ‘ Yes
Redeployment No cost impact for same number of virtual machines (VMs)
πŸ‘ Azure Blob Storage
Azure Blob Storage
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Chaos Studio
Azure Chaos Studio
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Container Apps
Azure Container Apps
πŸ‘ Yes
Redeployment Minimum replica count required
πŸ‘ Azure Container Instances
Azure Container Instances
πŸ‘ Yes
Redeployment No cost impact
πŸ‘ Azure Container Registry
Azure Container Registry
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Cosmos DB
Azure Cosmos DB
πŸ‘ Yes
Modification None if using autoscale or multi-region writes
πŸ‘ Azure Data Explorer
Azure Data Explorer
πŸ‘ Yes
Modification Moderate cost increase
πŸ‘ Azure Data Factory
Azure Data Factory
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Data Lake Storage Gen2
Azure Data Lake Storage
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Database for MySQL
Azure Database for MySQL
πŸ‘ Yes
πŸ‘ Yes
Redeployment Requires primary and high availability (HA) replica
πŸ‘ Azure Database for PostgreSQL
Azure Database for PostgreSQL
πŸ‘ Yes
πŸ‘ Yes
Enablement Requires primary and HA replica
πŸ‘ Azure Databricks
Azure Databricks
πŸ‘ Yes
Enablement No cost impact for same number of VMs; moderate cost increase for storage
πŸ‘ Azure Disk Storage
Azure Disk Storage
πŸ‘ Yes
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure DDoS Protection
Azure DDoS Protection
πŸ‘ Yes
Always zone-resilient N/A
πŸ‘ Azure Device Registry
Azure Device Registry
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Elastic SAN
Azure Elastic SAN
πŸ‘ Yes
Redeployment Moderate cost increase
πŸ‘ Azure Event Grid
Azure Event Grid
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Event Hubs
Azure Event Hubs: Dedicated tier
πŸ‘ Yes
Always zone resilient Minimum capacity units (CUs) required
πŸ‘ Azure Event Hubs
Azure Event Hubs: all other tiers
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure ExpressRoute gateway
Azure ExpressRoute gateway
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Files
Azure Files
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Firewall
Azure Firewall
πŸ‘ Yes
πŸ‘ Yes
New firewalls: Zone resilient by default
Existing nonzonal firewalls: Modification (automatic migration in progress)
No cost impact
πŸ‘ Azure Functions
Azure Functions: Dedicated plan
πŸ‘ Yes
Enablement Minimum instance count required
πŸ‘ Azure Functions
Azure Functions: Flex Consumption plan
πŸ‘ Yes
Enablement Minimum instance count required
πŸ‘ Azure Functions
Azure Functions: Premium plan
πŸ‘ Yes
Redeployment Minimum instance count required
πŸ‘ Azure HDInsight
Azure HDInsight
πŸ‘ Yes
Redeployment No cost impact for same number of nodes
πŸ‘ Azure IoT Hub
Azure IoT Hub
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Key Vault
Azure Key Vault
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact
πŸ‘ Azure Load Balancer
Azure Load Balancer
πŸ‘ Yes
πŸ‘ Yes
Modification No cost impact
πŸ‘ Azure Logic Apps
Azure Logic Apps - Consumption tier
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Logic Apps
Azure Logic Apps - Standard tier
πŸ‘ Yes
Redeployment Minimum tier and instance count required
πŸ‘ Azure Managed Grafana
Azure Managed Grafana
πŸ‘ Yes
Redeployment Moderate cost increase
πŸ‘ Azure Managed Redis
Azure Managed Redis
πŸ‘ Yes
Enablement Requires primary and HA instance
πŸ‘ Azure Monitor Logs
Azure Monitor Logs
πŸ‘ Yes
Always zone resilient
πŸ‘ Azure NAT Gateway
Azure NAT Gateway
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact
πŸ‘ Azure NetApp Files
Azure NetApp Files
πŸ‘ Yes
Redeployment Depends on replication configuration
πŸ‘ Azure Private Link service
Azure Private Link service
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure Queue Storage
Azure Queue Storage
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Service Bus
Azure Service Bus
πŸ‘ Yes
Always zone-resilient N/A
πŸ‘ Azure Service Fabric
Azure Service Fabric
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact for same number of VMs
πŸ‘ Azure SignalR Service
Azure SignalR Service
πŸ‘ Yes
Enablement Minimum tier required
πŸ‘ Azure Site Recovery
Azure Site Recovery
πŸ‘ Yes
Redeployment No cost impact for Site Recovery, moderate cost increase for replica storage
πŸ‘ Azure SQL Database
Azure SQL Database: Business Critical tier
πŸ‘ Yes
Enablement No cost impact
πŸ‘ Azure SQL Database
Azure SQL Database: General Purpose tier
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure SQL Database
Azure SQL Database: Hyperscale tier
πŸ‘ Yes
Redeployment Minimum replica count required
πŸ‘ Azure SQL Database
Azure SQL Database: Premium tier
πŸ‘ Yes
Enablement No cost impact
πŸ‘ Azure SQL Managed Instance
Azure SQL Managed Instance
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Stream Analytics
Azure Stream Analytics
πŸ‘ Yes
Always zone-resilient N/A
πŸ‘ Azure Table Storage
Azure Table Storage
πŸ‘ Yes
Enablement Moderate cost increase
πŸ‘ Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact for same number of VMs
πŸ‘ Azure Virtual Machines
Azure Virtual Machines
πŸ‘ Yes
Modification No cost impact for same number of VMs
πŸ‘ Azure Virtual Network
Azure Virtual Network
πŸ‘ Yes
Always zone resilient N/A
πŸ‘ Azure VMware Solution
Azure VMware Solution
πŸ‘ Yes
πŸ‘ Yes
Redeployment No cost impact for same number of nodes
πŸ‘ Azure VPN Gateway
Azure VPN Gateway
πŸ‘ Yes
πŸ‘ Yes
Modification Minimum SKU required
πŸ‘ Azure Web PubSub
Azure Web PubSub
πŸ‘ Yes
Enablement Minimum tier required
πŸ‘ Azure public IP address
Public IP address
πŸ‘ Yes
πŸ‘ Yes
Always zone resilient N/A

Related resources


Feedback

Was this page helpful?

Additional resources