Note

Access to this page requires authorization. You can try signing in or .

Access to this page requires authorization. You can try .

Reliability in Azure Database for PostgreSQL

Azure Database for PostgreSQL is a fully managed database service that gives you granular control and flexibility over database management functions and configuration settings. The service provides high-availability and disaster-recovery capabilities based on your requirements.

When you use Azure, reliability is a shared responsibility. Microsoft provides a range of capabilities to support resiliency and recovery. You're responsible for understanding how those capabilities work within all of the services you use, and selecting the capabilities you need to meet your business objectives and uptime goals.

This article describes how to make Azure Database for PostgreSQL resilient to various potential outages and problems, including transient faults, availability zone outages, region outages, and service maintenance. It also describes how you can use backups to recover from other types of problems, and highlights key information about the Azure Database for PostgreSQL service-level agreement (SLA).

Production deployment recommendations

To learn how to deploy Azure Database for PostgreSQL to support your solution's reliability requirements, and how reliability affects other aspects of your architecture, see Architecture best practices for Azure Database for PostgreSQL in the Azure Well-Architected Framework.

Reliability architecture overview

This section describes some of the important aspects of how the service works that are most relevant from a reliability perspective. The section introduces the logical architecture, which includes some of the resources and features that you deploy and use. It also discusses the physical architecture, which provides details on how the service works under the covers.

Logical architecture

When you work with Azure Database for PostgreSQL, you deploy a server, which represents the compute and storage resources required to support the databases that you deploy to the server.

You can deploy servers in multiple compute tiers: Burstable, General Purpose, and Memory Optimized. Each tier is optimized for different kinds of workloads. In some Azure regions, you can deploy servers with Azure Confidential Computing.

For more information about the general service architecture and deployment models, see Azure Database for PostgreSQL overview.

Physical architecture

  • Compute and storage separation: Azure Database for PostgreSQL uses a compute and storage separation architecture to support high availability. The database engine runs on a Linux virtual machine (VM), while Azure Storage holds the data files and keeps three locally redundant synchronous copies of the database files to ensure data durability.

  • High availability: You can enable a high-availability configuration on your server. When you enable the high-availability configuration, the service provisions and maintains a warm standby server. The primary server synchronously replicates data changes to the standby server to ensure zero data loss during a failure of the primary server.

    The architecture separates the compute layer from the storage layer, so the service can handle different types of failures appropriately. For higher resiliency, you can spread the servers across availability zones.

    👁 Diagram showing the high-availability architecture, with a primary and standby server.

    Diagram showing the high-availability architecture for Azure Database for PostgreSQL. Two servers are side by side. On the left is a box labeled primary server, and inside that box is a virtual machine and a disk. On the right is a matching box labeled standby server that also contains a virtual machine and a disk. A horizontal arrow points from the primary server on the left to the standby server on the right, and the arrow is labeled streaming replication, indicating a one-way relationship where data changes flow from the primary server to the standby server.

    A standby server is deployed in the same VM configuration as the primary server, including vCores, storage, and network settings.

    You can switch between servers by performing a failover. Two types of failover exist: forced failovers, which are used when the primary server fails, and planned failovers, which are used during some maintenance operations and in other scenarios where you need to minimize application downtime during a failover.

    When you perform operations such as stop, start, and restart, they occur on both primary and standby database servers at the same time. Planned events such as compute scaling and storage scaling happen on the standby first and then on the primary server. Currently, the server doesn't fail over for these planned operations.

    For more information, see High availability in Azure Database for PostgreSQL.

  • Backups: Azure Database for PostgreSQL automatically creates server backups. For more information, see Backup and restore.

Resilience to transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

Your applications must handle transient connectivity errors that can occur during maintenance, scaling operations, or network interruptions. Follow these recommendations:

  • When your application encounters transient faults, retry the operation by using exponential backoff. Increase the delay between retries and limit the number of attempts. If the operation still fails after the maximum retries, treat it as a failure.

  • Where possible, use client libraries (also called drivers) that automatically handle retries.

  • Transient errors that occur during write operations require more careful consideration. Consider making your write operations idempotent, so they can be safely executed multiple times.

For more information, see Handling transient connectivity errors in Azure Database for PostgreSQL.

Resilience to availability zone failures

Availability zones are physically separate groups of datacenters within an Azure region. When one zone fails, services can fail over to one of the remaining zones.

Select your type of availability zone support through the high-availability configuration. When you enable high availability, the service deploys a standby server alongside your primary server. This high-availability model helps ensure that committed data is never lost during failures. Whichever high-availability deployment model your server uses, it synchronously commits data to both the primary and standby servers. If a disruption occurs to the primary server, the server automatically fails over to the standby server.

Each availability zone stores data files and write-ahead logs (WALs) on premium managed disks with locally redundant storage (LRS) that automatically stores three data copies within each zone.

Azure Database for PostgreSQL supports two availability zone configuration types when you use high availability:

  • Zone-redundant high availability: Zone redundancy provides the highest level of zone resilience by deploying a primary server in one availability zone and a standby server in a different availability zone. The standby server uses compute, storage, and network configuration that's similar to that of the primary server. A zone-redundant configuration provides physical isolation of the entire stack between primary and standby servers.

    You can either select the availability zones for the primary and standby servers or let Microsoft choose them.

    We recommend zone-redundant deployments for production servers.

    👁 Diagram showing a zone-redundant Azure Database for PostgreSQL setup.

    Diagram showing a zone-redundant Azure Database for PostgreSQL setup spread across availability zones. Three zones are listed across the top: availability zone 1, availability zone 2, and availability zone 3. Under availability zone 1 is a box labeled primary server, and inside that box is a virtual machine and a disk, showing that the primary server consists of compute and storage. Under availability zone 2, there's a matching box labeled standby server that also contains a virtual machine and a disk. Between these two server boxes, there's a right-pointing arrow labeled streaming replication, showing that data changes flow from the primary server on the left to the standby server on the right. The layout communicates cross-zone resilience: primary and standby are separated across two availability zones, while availability zone 3 remains unused.

    Write operations can experience a small increase in commit latency because the service synchronously replicates data to the standby server. The impact varies by workload, selected SKU, and region.

  • Zonal (same-zone) high availability: The primary and standby servers use the same availability zone. If a disruption occurs to the primary server, but the zone is still healthy, the server automatically fails over to the standby server. A zonal deployment gives you high availability within a single availability zone. It protects you against node-level failures and also helps to reduce application downtime during planned and unplanned downtime events. However, it doesn't protect against an outage in that zone.

    👁 Diagram showing a zonal Azure Database for PostgreSQL setup.

    Diagram showing a zonal Azure Database for PostgreSQL setup in a single availability zone. Three zones are shown: availability zone 1, availability zone 2, and availability zone 3. In availability zone 1, there are two boxes side by side. The box on the left is labeled primary server, and inside that box is a virtual machine and a disk. The box on the right is labeled standby server, and inside that box is a virtual machine and a disk. Between these two server boxes, there's a right-pointing arrow labeled streaming replication, showing that data changes flow from the primary server on the left to the standby server on the right. Both servers are in the same availability zone. Availability zone 2 and availability zone 3 are unused.

    Zonal (same-zone) high availability is only available in the following situations:

    • The region doesn't support availability zones. The region effectively functions as a single zone, so the only high-availability configuration you can select is same-zone.
    • If a region doesn't have sufficient capacity for a zone-redundant deployment, the service can initially place both servers in the same availability zone and then automatically migrate them to separate zones when capacity becomes available. This option is available when you use the Azure portal or the Azure CLI to deploy a server. For more information, see Configure Business Critical (high availability) options.

    Placing the servers in the same zone can reduce the write latency to applications you deploy within the same zone.

    When the servers are in the same zone, the write latency to applications you deploy within the same zone can be reduced.

If you configure your server without high availability, then it runs on a single server. If that server or its zone goes down, your server is unavailable. For more information, see Configurations without availability zones.

Requirements

  • Region support: Azure Database for PostgreSQL supports availability zone configurations differently across Azure regions. For a full list of regions, and the types of availability zone support and any specific considerations for each region, see Azure regions.

  • Compute tier: The following table lists the compute tier support for each type of availability zone support:

    Compute tier Zone redundant Zonal (same-zone)
    Burstable Not supported Not supported
    General Purpose Supported Supported
    Memory Optimized Supported Supported
  • Service tier: Both types of high availability require General Purpose or Memory Optimized tiers.

Considerations

Region capacity: If a region doesn't have sufficient capacity for a zone-redundant deployment, the service can initially place both servers in the same availability zone and automatically migrate them to separate zones when capacity becomes available. This option is available when you use the Azure portal or the Azure CLI to deploy a server. For more information, see Configure Business Critical (high availability) options.

Cost

When you enable high availability, a standby server is created and it's billed at the same rate as the primary server. The availability zone configuration doesn't affect the cost. There are no charges for data replication within or between availability zones. Depending on your backup storage volume, you might also be billed for backup storage. For detailed pricing information, see Azure Database for PostgreSQL pricing.

Configure availability zone support

To configure availability zone support for a server, configure the high-availability settings.

  • Create a zone-redundant server: To learn how to create a server with high availability and zone redundancy enabled, see Quickstart: Create an Azure Database for PostgreSQL server.

  • Change the availability zone configuration for existing servers: Change the availability zone configuration for existing servers by changing the high-availability settings. For detailed steps, see Enable high availability for existing servers.

    You can't change the zone used for either the primary or standby server. You need to create the server again.

    Tip

    We recommend that you wait until the server activity is low before you change the high-availability configuration.

  • Disable high availability: Disabling high availability removes the standby server, so your server isn't resilient to outages in its availability zone. For more information, see Disable high availability.

Behavior when all zones are healthy

This section describes what to expect when you configure servers with high availability and availability zone support, and all availability zones are operational.

  • Cross-zone operation: PostgreSQL client applications connect to the primary server by using the database server name. Azure Database for PostgreSQL uses an active-passive configuration where the primary server in the primary availability zone handles all database connections and queries. The standby server doesn't serve client traffic during normal operations.

  • Cross-zone data replication: The primary server synchronously replicates changes to the standby server. Transactions aren't considered complete until both the primary and standby servers acknowledge the write.

    When an application writes and commits data, PostgreSQL first records the change in the WAL on the primary server. The primary server streams these logs to the standby server by using the PostgreSQL streaming protocol. After the standby server durably stores the WAL, the primary server confirms the write. The application commits its transaction only after this acknowledgment. This acknowledgment process doesn't wait for the logs to be applied to the standby server.

    The effects of replication are different depending on the availability zone configuration that your server uses:

    • Zone-redundant: Because the servers are in separate zones, this approach ensures zero data loss during a zone failure. This situation is also sometimes called achieving a recovery point objective (RPO) of zero for zone failures.

      However, cross-zone replication might introduce a small amount of extra latency. The impact of the latency depends on the application. For most applications, the extra latency is negligible.

    • Zonal: Because both servers are in the same zone, no traffic is replicated between zones.

    Note

    The system replicates log data in real time to the standby server. Any user errors on the primary server, such as an accidental drop of a table or incorrect data updates, are replicated to the standby server. You can't use the standby to recover from these kinds of errors, and you must perform a point-in-time restore from the backup. For more information, see Backup and restore.

Behavior during a zone failure

This section describes what to expect when you configure servers with high availability and availability zone support, and there's an availability zone outage.

  • Detection and response: Azure periodically checks the health of both the primary and standby servers. After multiple pings, if health monitoring detects that a primary server isn't reachable, the service initiates an automatic failover to the standby server. The health monitoring algorithm uses multiple data points to avoid false positive situations.

    If an availability zone fails, the behavior is different depending on the availability zone configuration that your server uses:

    • Zone-redundant: Azure Database for PostgreSQL automatically detects availability zone failures. To view the possible high-availability status types, see High Availability (HA) health status monitoring. When a zone fails, Azure initiates a forced failover to the standby server without requiring you to take action.

    • Zonal: If the availability zone that hosts a zonal server becomes unavailable, both the primary and standby servers are unavailable. In this scenario, the service doesn't provide automatic failover. You're responsible for detecting the zone outage and performing recovery actions, such as restoring zone‑redundant backups to a separate server in another availability zone or region.

  • Notification: High-availability health status monitoring in Azure Database for PostgreSQL provides a continuous overview of the health and readiness of high availability-enabled instances. The monitoring feature is built on top of Azure Resource Health, and can detect and alert on any issues that might affect your database's failover readiness or overall availability. Assess key metrics like connection status, failover state, and data replication health to enable proactive troubleshooting and help maintain your database's uptime and performance.

    For a detailed guide on configuring and interpreting HA health statuses, see High Availability (HA) health status monitoring.

  • Active requests: When an availability zone becomes unavailable, any in‑progress requests to servers in the affected zone might be terminated. Applications must retry these requests. If your clients handle transient faults appropriately by retrying after a short period of time, they typically avoid significant impact.

  • Expected data loss: The amount of data loss depends on the availability zone configuration that your server uses.

    • Zone-redundant: Zero data loss is expected during zone failover because of synchronous replication between the primary and standby servers in different zones.

    • Zonal: Data on servers in the affected zone is unavailable until the zone recovers.

  • Expected downtime: The amount of downtime depends on the availability zone configuration that your server uses.

    • Zone-redundant: Failover typically completes within 60-120 seconds. If your clients handle transient faults appropriately by retrying after a short period of time, they typically avoid significant impact.

    • Zonal: Servers in an affected zone are unavailable until the availability zone recovers.

  • Redistribution: The traffic rerouting behavior depends on the availability zone configuration that your server uses.

    • Zone-redundant: After failover, the former standby server becomes the new primary and begins accepting new connections. Azure automatically establishes a new standby server in the original primary zone after it recovers. For full details, see Forced failover.

    • Zonal: When a zone is unavailable, your server is unavailable. If you have a separate server that you created in advance in another availability zone or region, you're responsible for rerouting traffic to that server.

Zone recovery

The zone recovery behavior depends on the availability zone configuration that your server uses.

  • Zone-redundant: When the availability zone recovers, Azure Database for PostgreSQL automatically rebuilds the standby server in the recovered zone and synchronizes it with the current primary. The recovered zone then serves as the standby location. To avoid unnecessary disruption, the service doesn't automatically move the primary role back to the original zone. You can manually initiate a planned failover if you want to return the primary to the original zone.

  • Zonal: After the zone is healthy, servers in the zone are available again. You're responsible for any zone recovery procedures and data synchronization that your workloads require.

Test for zone failures

The options for testing for zone failures depend on the availability zone configuration that your instance uses.

  • Zone-redundant: You can test your application's resilience to failover by initiating a forced failover. A forced failover lets you simulate an unplanned outage scenario while running your workload and observe your application downtime. We recommend that you run simulations in a nonproduction environment, or at a quiet time. For more information, see Initiate a forced failover.

  • Zonal: While you can't simulate a full zone outage, you can simulate your server being unavailable in a way that's similar to a zone outage. For more information, see Stop compute of a server.

Resilience to region-wide failures

Azure Database for PostgreSQL supports cross-region read replicas, which you can use to maintain a synchronized copy of your database in a different region for faster recovery.

You can also use geo-redundant backups, in supported regions, to provide cross-region recovery. However, backups typically involve more downtime and data loss than replication. For more information, see Backup and restore.

Cross-region read replicas

You can deploy read replicas to protect your databases from region-level failures. Each read replica is a separate Azure Database for PostgreSQL server. When you place a read replica in a second Azure region, your database server can provide resilience to a region-wide problem. You can deploy up to five read replicas, which can optionally be in different Azure regions. PostgreSQL's physical replication technology updates read replicas asynchronously, and they can lag the primary. Cross-region read replicas can optionally serve read-only workloads to reduce latency for globally distributed applications or to offload read traffic from the primary server. For more information on read replica features and considerations, see Read replicas.

Virtual endpoints provide read-write and read-only endpoints and automatically redirect traffic when a replica is promoted, which helps minimize downtime during failover events. We strongly recommend using virtual endpoints with cross-region read replicas to improve application resilience. For more information, see Virtual endpoints for read replicas in Azure Database for PostgreSQL.

👁 Diagram showing a primary server in one region and a read replica in a second region.

Diagram showing an application at the top. Directly below it is a box labeled read-write endpoint. There's a downward arrow from the application to the endpoint, showing that the application sends its database traffic to this endpoint first. The lower half of the diagram is split into two large areas. On the left is the primary region. Inside that region, there's a box labeled primary server, and inside the box the service name Azure Database for PostgreSQL server. On the right is the secondary region. Inside that region, there's a matching server box labeled read replica promoted primary server, also labeled Azure Database for PostgreSQL server. An arrow runs from the read-write endpoint to the primary server. A dashed horizontal arrow labeled asynchronous replication runs from the primary server on the left to the server in the secondary region on the right, showing that data changes are copied from primary to replica.

If your primary region fails, you can trigger a promotion so that your secondary replica becomes the primary. Different types of failover might be appropriate depending on how you use read replicas. When you use read replicas to provide resilience to region failures, you typically use the promote to primary server approach, which updates your virtual endpoint. During a region outage, you need to perform a forced promotion, which can result in some data loss for any unreplicated data. In planned scenarios where the primary region is healthy, you can choose to perform a planned promotion to avoid data loss. For more information, see Promote read replicas in Azure Database for PostgreSQL.

👁 Diagram showing a read replica in a second Azure region that was promoted to the primary replica.

Diagram showing an application at the top sending data through a read-write endpoint. The lower half of the diagram is split into two large areas. On the left is the primary region. Inside that region, there's a box labeled primary server, and inside the box the service name Azure Database for PostgreSQL server. There's an x over the primary region, indicating that it's no longer active. On the right is the secondary region. Inside that region, there's a matching server box labeled read replica promoted primary server, also labeled Azure Database for PostgreSQL server. An arrow runs from the read-write endpoint to the secondary region. A dashed horizontal arrow labeled asynchronous replication that runs from the primary region to the secondary region is covered by an x, indicating that the replication is no longer active.

Note

This section summarizes some important information about how read replicas can support resilience to region-wide failures. You can also use read replicas to improve performance and support high-scale geographically distributed user bases. For more information, see Read replicas.

Requirements

  • Region support: You can create cross-region read replicas in any region that supports Azure Database for PostgreSQL. You're not limited to Azure paired regions.

  • Compute tiers: The General Purpose and Memory Optimized compute tiers support read replicas. The Burstable tier doesn't support read replicas.

Considerations

  • Configuration differences: Read replicas might not inherit all configuration settings from the primary server. Plan to configure necessary settings after failover. Your primary server and replicas should be symmetrical, which means they need to have the same tiers, storage sizes, and values for some settings. During region failures, the symmetrical server requirement can be waived for forced promotions, but it's a good practice to have symmetrical configuration where possible to avoid unexpected problems. For more information, see Configuration management.

  • Monitoring replication lag: The asynchronous replication process requires a replication lag, which can vary depending on many factors. When the replication lag is high, your server might experience problems. It's important to monitor the replication lag so that you can mitigate problems before they escalate. For more information, see Monitor replication.

  • High availability: Read replicas can't have high availability enabled, and when they're promoted, they also don't have high availability. You're responsible for configuring high availability after promoting a replica.

For other factors about the promotion process to consider, see Considerations.

Cost

Read replicas incur compute and storage costs, plus cross-region data transfer charges for replication. For detailed pricing information, see Azure Database for PostgreSQL pricing and Bandwidth pricing.

Configure multiregion support

  • Create a read replica: To learn how to create a read replica, see Create a read replica. You can configure replicas after creating the primary server, as long as the primary server is running and accessible.

    To create a virtual endpoint, see Create virtual endpoints.

  • Delete a read replica: To learn how to delete a read replica, see Delete a read replica.

Behavior when all regions are healthy

This section describes what to expect when your server is configured with a read replica in another region and a virtual endpoint, and all regions are operational:

  • Traffic routing between regions: In normal operations, your virtual endpoint directs traffic for the read-write endpoint to the primary server in the primary region. If you also use the virtual endpoint's read-only endpoint, it directs traffic to whichever replica you configure.

  • Data replication between regions: Cross-region read replicas use asynchronous replication to minimize impact on primary server performance. The amount of replication lag depends on many factors, including the write load and the latency between the primary server and replicas. Replication lag is typically at least several minutes, but it can be longer. For more information, see Monitor replication.

Behavior during a region failure

This section describes what to expect when your server is configured with a read replica in another region and a virtual endpoint, and there's an outage in the primary region:

  • Detection and response: You're responsible for detecting an outage in the primary region, and manually promoting a read replica to become the new primary server. During a region outage, you must perform a forced promotion, which results in the loss of any unreplicated data.

    Important

    You're responsible for triggering promotion. Azure doesn't promote read replicas automatically, even if there's a region failure.

    For detailed steps to initiate a promotion, see Switch over read replica to primary.

  • Notification: Microsoft doesn't automatically notify you when a region is down. However, you can use Azure Service Health to understand the overall health of the service, including any region failures, and you can set up Service Health alerts to notify you of problems.

  • Active requests: The promotion process drops all active connections to the primary region. After the promotion process completes, applications need to retry making connections to the promoted replica.

  • Expected data loss: During a region outage, you must perform a forced promotion, which results in the permanent loss of any unreplicated data.

    The amount of data loss depends on the replication lag at the time of the outage. Replication lag is typically at least several minutes, but it can be longer. For more information, see Monitor replication.

  • Expected downtime: Forced promotion typically completes within 1-3 minutes of being triggered. Applications might also need to reconnect to the correct endpoint. Virtual endpoints are updated as part of the forced promotion process. Applications should honor the time-to-live (TTL) of the endpoint's DNS records to ensure they quickly reconnect to the correct replica after promotion completes.

  • Traffic rerouting: The virtual endpoint for the server automatically redirects application traffic to the new primary replica.

    Note

    After a read replica is promoted to be the primary server, it doesn't have high-availability configuration enabled. You need to enable high-availability configuration manually, or add it to your own automation processes.

Region recovery

When you use virtual endpoints, after the primary region recovers, the old primary server is automatically configured as a read replica. You can perform another promotion to return the primary operations to your preferred primary region.

Test for region failures

Regularly test read-replica promotion procedures to ensure your processes are valid, and that the capabilities meet your recovery time objective (RTO) and recovery point objective (RPO) requirements.

You can promote a read replica to become the primary server at any time, even when all regions are healthy. For testing:

  • You can perform forced promotion testing. We recommend that you perform these tests in a nonproduction environment because it can result in data loss. Forced promotion testing helps to simulate the behavior that you see during a region outage.
  • For planned maintenance, or testing scenarios where you want to avoid data loss, use a planned promotion instead. However, planned promotion follows a different process than promotion during a region outage, so it might not reflect the behavior during a true region outage.

For step-by-step instructions, see Switch over read replica to primary.

As part of your disaster-recovery strategy, regularly run full recovery drills. These drills should include data validation, application functionality testing, and documented rollback procedures.

Backup and restore

Azure Database for PostgreSQL automatically backs up your data. These backups provide point-in-time recovery capabilities and help protect you against accidental corruption and deletion of data. Microsoft fully manages the backups. They don't interrupt the availability of the server, and they include both full backups and transaction log backups.

  • Backup storage: If you deploy the server in a region with availability zones, the service stores backups in zone-redundant storage (ZRS), regardless of the server's high-availability configuration. For servers deployed in regions without availability zones, the service stores backups in locally redundant storage (LRS).

    In Azure regions with pairs, you can configure geo-redundant backup storage at server creation time to replicate backups to the Azure paired region for extra protection against region failures. The service replicates backups asynchronously.

    The default backup retention period is seven days, but you can extend retention up to 35 days. You can also use Azure Backup for long-term storage of manual backups for up to 10 years. All backups are encrypted.

  • Restore: Point-in-time recovery allows you to restore your database to any moment within the backup retention period. The restore process creates a new database server with a new user-provided server name. You can use the new server as-is or copy data from it.

    When you restore a geo-redundant backup, you create a new server in the paired region.

    This capability is useful for recovering from accidental data modifications, application errors, or testing scenarios.

For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see What are redundancy, replication, and backup?.

For more information, see Backup and restore in Azure Database for PostgreSQL.

Resilience to service maintenance

Azure Database for PostgreSQL automatically handles critical servicing tasks, including patching the underlying hardware, operating system, and database engine. The service includes security updates, software updates, and minor version upgrades as part of planned maintenance.

To ensure your server remains available during maintenance windows, follow these recommendations:

  • Enable high availability: During maintenance, the server might need to restart as part of the update process. If you enable high availability, maintenance operations typically use rolling updates to minimize downtime. Periodic maintenance activities such as minor version upgrades happen on the standby replica first. To reduce downtime, the standby is promoted to primary so that workloads can continue on the promoted node while maintenance tasks are applied to the other node. This sequencing applies whether your server uses zone-redundant or zonal high availability.

    For servers without high availability enabled, expect brief downtime during maintenance operations. With high availability enabled, maintenance operations typically complete with minimal or no downtime.

  • Configure custom maintenance windows: You can configure the maintenance schedule to be system managed or define a custom maintenance window to minimize the impact on your business operations. Schedule maintenance during low-activity periods to minimize business impact. For more information, see Schedule maintenance.

  • Implement retry logic: Ensure your applications can handle brief connectivity interruptions that might occur during maintenance restarts. To make your applications resilient to these types of problems, see Resilience to transient faults guidance.

Service-level agreement

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that your solution must meet to achieve that availability expectation. For more information, see SLAs for online services.

Azure Database for PostgreSQL provides different availability SLAs, depending on the server's configuration:

  • Servers configured with zone-redundant high availability offer an uptime SLA of 99.99%.
  • Servers configured with zonal high availability offer an uptime SLA of 99.95%.
  • Servers configured without high availability offer an uptime SLA of 99.9%.

Related content


Feedback

Was this page helpful?

Additional resources