Architecture pattern for mission-critical workloads on Azure

This article presents a key pattern for mission-critical architectures on Azure. Apply this pattern when you start your design process, and then select components that are best suited for your business requirements. The article recommends a north star design approach and includes other examples with common technology components.

We recommend that you evaluate the key design areas, define the critical user and system flows that use the underlying components, and develop a matrix of Azure resources and their configuration while keeping in mind the following characteristics.

Characteristic	Considerations
Lifetime	What's the expected lifetime of the resource, relative to other resources in the solution? Should the resource outlive or share the lifetime with the entire system or region, or should it be temporary?
State	What impact will the persisted state at this layer have on reliability or manageability?
Reach	Is the resource required to be globally distributed? Can the resource communicate with other resources, located globally or within that region?
Dependencies	What are the dependencies on other resources?
Scale limits	What is the expected throughput for that resource? How much scale is provided by the resource to fit that demand?
Availability/disaster recovery	What is the impact on availability from a disaster at this layer? Would it cause a systemic outage or only a localized capacity or availability issue?

Based on the preceding characteristics, classify and identify mission-critical resources. That activity can help track resource utilization and associated costs, while helping you focus optimization efforts where they matter most. We recommend that you tag groups of resources deemed critical to your business. Keep in mind that some of these resources may be shared across multiple workloads.

For information on Microsoft-recommended tags, see Label mission-critical workloads.

Core architecture pattern

👁 Diagram showing a generic pattern for a mission-critical application.

Global resources

Certain resources are globally shared by resources deployed within each region. Common examples are resources that are used to distribute traffic across multiple regions, store permanent state for the whole application, and monitor resources at the global level.

Characteristic	Considerations
Lifetime	These resources are expected to be long-lived. Their lifetime spans the life of the system or longer. Often the resources are managed with in-place data and control plane updates, assuming they support zero-downtime update operations.
State	Because these resources exist for at least the lifetime of the system, this layer is often responsible for storing global, geo-replicated state.
Reach	The resources should be globally distributed and replicated to the regions that host those resources. It's recommended that these resources communicate with regional or other resources with low latency and the desired consistency.
Dependencies	The resources should avoid dependencies on regional resources because their unavailability can be a cause for global failure. For example, certificates or secrets kept in a single vault could have global impact if there's a regional failure where the vault is located.
Scale limits	Often these resources are singleton instances in the system, and they should be able to scale such that they can handle throughput of the system as a whole.
Availability/disaster recovery	Regional and stamp resources can use global resources. It's critical that global resources are configured with high availability and disaster recovery for the health of the whole system.

Regional stamp resources

The stamp contains the application and resources that participate in completing business transactions. A stamp typically corresponds to a deployment to an Azure region. A region can have more than one stamp.

Characteristic	Considerations
Lifetime	The resources should be replaceable and have a shorter lifecycle than regional or global resources. They can be added and removed dynamically while regional resources outside the stamp continue to persist.
State	Avoid storing long-lived state in a stamp. A stamp should be stateless as much as possible.
Reach	Stamp resources can communicate with regional and global resources. Avoid communication with other regions or other stamps.
Dependencies	The stamp resources must be independent. They're expected to have regional and global dependencies but shouldn't rely on components in other stamps in the same or other regions.
Scale limits	Throughput is established through testing. The throughput of the overall stamp is limited to the least performant resource. Stamp throughput needs to account for failover demand from another stamp.
Availability/disaster recovery	Because stamps are replaceable, recovery can use redeployment when the affected resources don't contain long-lived state.

Regional resources

A system can have resources that are deployed in a region but outlive the stamp resources. For example, observability resources that monitor resources at the regional level, including the stamps.

Characteristic	Considerations
Lifetime	The resources share the lifetime of the region and outlive the stamp resources.
State	State stored in a region can't live beyond the lifetime of the region. If state needs to be shared across regions, consider using a global data store.
Reach	The resources don't need to be globally distributed. Direct communication with other regions should be minimized.
Dependencies	The resources can have dependencies on global resources, but not on stamp resources because stamps are meant to be short-lived.
Scale limits	Determine the scale limit of regional resources by combining all stamps within the region.
Availability/disaster recovery	Plan recovery at the regional scope so a regional resource failure doesn't become a global failure.

Baseline architectures for mission-critical workloads

These baseline examples serve as a recommended north star architecture for mission-critical applications. They use containerization and Azure Kubernetes Service (AKS). Azure Container Apps can also fit workloads that don't require direct Kubernetes API access.

Refer to Well-Architected mission-critical workloads: Containerization.

👁 Diagram shows a baseline mission-critical application.

Baseline architecture

If you're just starting your mission-critical journey, use this architecture as a reference. The workload is accessed over a public endpoint and doesn't require private network connectivity to other company resources.
👁 Diagram shows the baseline architecture extended with network controls.

Baseline with network controls

This architecture builds on the baseline architecture. The design is extended to provide strict network controls to prevent unauthorized public access from the internet to the workload resources.
👁 Diagram shows the baseline architecture deployed using Azure landing zones.

Baseline in Azure landing zones

This architecture is appropriate if you're deploying the workload in an enterprise setup where integration within a broader organization is required. The workload uses centralized shared services, needs on-premises connectivity, and integrates with other workloads within the enterprise. It's deployed in an application landing zone subscription.

Design areas

We recommend that you use the provided design guidance to navigate the key design decisions to reach an optimal solution. For information, see What are the key design areas?

Next step

Review the best practices for designing mission-critical application scenarios.

Application design

Feedback

Was this page helpful?

Additional resources

Last updated on

URL: https://learn.microsoft.com/en-us/azure/well-architected/mission-critical/mission-critical-architecture-pattern