Note

Access to this page requires authorization. You can try signing in or .

Access to this page requires authorization. You can try .

Standard load balancer diagnostics with metrics, alerts, and resource health

Azure Load Balancer exposes the following diagnostic capabilities:

  • Multi-dimensional metrics and alerts: Provides multi-dimensional diagnostic capabilities through Azure Monitor for Azure Load Balancer configurations. You can monitor, manage, and troubleshoot your standard load balancer resources.

  • Resource health: The Resource Health status of your load balancer is available in the Resource health page under Monitor. This automatic check informs you of the current availability of your load balancer resource.

This article provides a quick tour of these capabilities, and it offers ways to use them for a standard load balancer.

Multi-dimensional metrics

Azure Load Balancer provides multi-dimensional metrics via the Azure Metrics in the Azure portal, and it helps you get real-time diagnostic insights into your load balancer resources. Please note that multi-dimensional metrics are not supported for Basic Load Balancers

The various load balancer configurations provide the following metrics:

Metric Resource type Description Recommended aggregation
Data Path Availability Public and internal load balancer A load balancer continuously uses the data path from within a region to the load balancer frontend, to the network that supports your VM. As long as healthy instances remain, the measurement follows the same path as your application's load-balanced traffic. The data path in use is validated. The measurement is invisible to your application and doesn’t interfere with other operations. Average
Health Probe Status Public and internal load balancer A load balancer uses a distributed health-probing service that monitors your application endpoint's health according to your configuration settings. This metric provides an aggregate or per-endpoint filtered view of each instance endpoint in the load balancer pool. You can see how load balancer views the health of your application, as indicated by your health probe configuration. Average
SYN Count Public and internal load balancer A load balancer doesn’t terminate Transmission Control Protocol (TCP) connections or interact with TCP or User Data-gram Packet (UDP) flows. Flows and their handshakes are always between the source and the VM instance. To better troubleshoot your TCP protocol scenarios, you can make use of SYN packets counters to understand how many TCP connection attempts are made. The metric reports the number of TCP SYN packets that were received. Sum
Source Network Address Translation (SNAT) Connection Count Public load balancer A load balancer reports the number of outbound flows that are masqueraded to the Public IP address frontend. SNAT ports are an exhaustible resource. This metric can give an indication of how heavily your application is relying on SNAT for outbound originated flows. Counters for successful and failed outbound SNAT flows are reported. The counters can be used to troubleshoot and understand the health of your outbound flows. Sum
Allocated SNAT Ports Public load balancer A load balancer reports the number of SNAT ports allocated per backend instance Average.
Used SNAT Ports Public load balancer A load balancer reports the number of SNAT ports that are utilized per backend instance. Average
Byte Count Public and internal load balancer A load balancer reports the data processed per front end. You may notice that the bytes aren’t distributed equally across the backend instances. This is expected as the Azure Load Balancer algorithm is based on flows Sum
Packet Count Public and internal load balancer A load balancer reports the packets processed per front end. Sum

Note

Bandwidth-related metrics such as SYN packet, byte count, and packet count will not capture any traffic to an internal load balancer via a UDR (eg. from an NVA or firewall).

Max and min aggregations are not available for the SYN count, packet count, SNAT connection count, and byte count metrics. Count aggregation is not recommended for Data path availability and health probe status. Use average instead for best represented health data.

Note

Data Path Availability metric may take up to 10 minutes to appear in Azure Monitor metrics after a load balancer is created or updated.

View your load balancer metrics in the Azure portal

The Azure portal exposes the load balancer metrics via the Metrics page. This page is available on both the load balancer's resource page for a particular resource and the Azure Monitor page.

Note

Azure Load Balancer does not send health probes to deallocated virtual machines. When virtual machines are deallocated, the load balancer will stop reporting metrics for that instance. Metrics that are unavailable will appear as a dashed line in Portal, or display an error message indicating that metrics cannot be retrieved.

To view the metrics for your load balancer resources:

  1. Go to the metrics page and do either of the following tasks:

    • On the load balancer's resource page, select the metric type in the drop-down list.

    • On the Azure Monitor page, select the load balancer resource.

  2. Set the appropriate metric aggregation type.

  3. Optionally, configure the required filtering and grouping.

  4. Optionally, configure the time range and aggregation. By default time is displayed in UTC.

Note

Time aggregation is important when interpreting certain metrics as data is sampled once per minute. If time aggregation is set to five minutes and metric aggregation type Sum is used for metrics such as SNAT allocation, your graph will display five times the total allocated SNAT ports.

Recommendation: When analyzing metric aggregation type Sum and Count, we recommend using a time aggregation value that is greater than one minute.

Retrieve multi-dimensional metrics programmatically via APIs

For API guidance for retrieving multi-dimensional metric definitions and values, see Azure Monitoring REST API walkthrough. These metrics can be written to a storage account by adding a diagnostic setting for the 'All Metrics' category.

Common diagnostic scenarios and recommended views

Is the data path up and available for my load balancer frontend?

Are the backend instances for my load balancer responding to probes?

How do I check my outbound connection statistics?

How do I check my SNAT port usage and allocation?

How do I check inbound/outbound connection attempts for my service?

How do I check my network bandwidth consumption?

How do I diagnose my load balancer deployment?

Configure alerts for multi-dimensional metrics

Azure Load Balancer supports easily configurable alerts for multi-dimensional metrics. Configure custom thresholds for specific metrics to trigger alerts with varying levels of severity to empower a no touch resource monitoring experience.

To configure alerts:

  1. Go to the alert page for the load balancer

  2. Create new alert rule

    1. Configure alert condition (Note: to avoid noisy alerts, we recommend configuring alerts with the Aggregation type set to Average, looking back on a five-minute window of data, and with a threshold of 95%)

    2. (Optional) Add action group for automated repair

    3. Assign alert severity, name, and description that enables intuitive reaction

Inbound availability alerting

Note

If your load balancer's backend pools are empty, the load balancer will not have any valid data paths to test. As a result, the data path availability metric will not be available, and any configured Azure Alerts on the data path availability metric will not trigger.

To alert for inbound availability, you can create two separate alerts using the data path availability and health probe status metrics. Customers may have different scenarios that require specific alerting logic, but the below examples are helpful for most configurations.

Using data path availability, you can fire alerts whenever a specific load-balancing rule becomes unavailable. You can configure this alert by setting an alert condition for the data path availability and splitting by all current values and future values for both frontend port and frontend IP address. Setting the alert logic to be less than or equal to 0 will cause this alert to be fired whenever any load-balancing rule becomes unresponsive. Set the aggregation granularity and frequency of evaluation according to your desired evaluation.

With health probe status, you can alert when a given backend instance fails to respond to the health probe for a significant amount of time. Set up your alert condition to use the health probe status metric and split by backend IP address and backend port, using the Average aggregation type. This ensures that you can alert separately for each individual backend instance’s ability to serve traffic on a specific port.

Outbound availability alerting

For outbound availability, you can configure two separate alerts using the SNAT connection count and used SNAT port metrics.

To detect outbound connection failures, configure an alert using SNAT connection count and filtering to Connection State = Failed. Use the Total aggregation. Then, you can split this by backend IP address set to all current and future values to alert separately for each backend instance experiencing failed connections. Set the threshold to be greater than zero or a higher number if you expect to see some outbound connection failures.

With used SNAT ports, you can alert on a higher risk of SNAT exhaustion and outbound connection failure. Ensure you’re splitting by backend IP address and protocol when using this alert. Use the Average aggregation. Set the threshold to be greater than a percentage of the number of ports you’ve allocated per instance that you determine is unsafe. For example, configure a low severity alert when a backend instance uses 75% of its allocated ports. Configure a high severity alert when it uses 90% or 100% of its allocated ports.

Resource health status

Health status for the standard load balancer resources is exposed via the existing Resource health under Monitor > Service health. It’s evaluated every two minutes by measuring data path availability that determines whether your frontend load-balancing endpoints are available.

Resource health status Description
Available Your standard load balancer resource is healthy and available.
Degraded Your standard load balancer has platform or user initiated events impacting performance. The metric for data path availability has reported less than 90% but greater than 25% health for at least two minutes. With this status, you experience moderate to severe performance effect. See Support and troubleshooting for Azure Load Balancer to determine whether there are user initiated events impacting your availability.
Unavailable Your standard load balancer resource isn’t healthy. The metric for data path availability has reported less the 25% health for at least two minutes. With this status, you experience significant performance effect or lack of availability for inbound connectivity. There may be user or platform events causing unavailability. See Support and troubleshooting for Azure Load Balancer to determine whether there are user initiated events impacting your availability.
Unknown Health status for your load balancer resource hasn’t been updated or hasn’t received information for data path availability for the last 10 minutes. This state should be transient and will reflect correct status as soon as data is received.

To view the health of your public standard load balancer resources:

  1. Select Monitor > Service health.

  2. Select Resource health, and then make sure that Subscription ID and Resource type = load balancer are selected.

  3. In the list, select the load balancer resource to view its historical health status.

A generic description of a resource health status is available in the resource health documentation.

Resource health alerts

Azure Resource Health alerts can notify you in near real-time when the health state of your Load balancer resource changes. It's recommended that you set resource health alerts to notify you when your Load balancer resource is in a Degraded or Unavailable state.

When you create Azure resource health alerts for Load balancer, Azure sends resource health notifications to your Azure subscription. You can create and customize alerts based on:

  • The subscription affected
  • The resource group affected
  • The resource type affected (Load balancer)
  • The specific resource (any Load balancer resource you choose to set up an alert for)
  • The event status of the Load balancer resource affected
  • The current status of the Load balancer resource affected
  • The previous status of the Load balancer resource affected
  • The reason type of the Load balancer resource affected

You can also configure who the alert should be sent to:

  • A new action group (that can be used for future alerts)
  • An existing action group

For more information on how to set up these resource health alerts, see:

Next steps


Feedback

Was this page helpful?

Additional resources