Large Application Insights Ingestion Delay

Fernando Rojo 40 Reputation points Microsoft Employee

We are seeing a large increase in Ingestion to storage Latency in our West-US 2 Application insights telemetry. Events appear to be getting received by AI but are not being processed and sent to ADE.

0 comments No comments

Sign in to comment

Answer accepted by question author

Pilladi Padma Sai Manisha 10,190 Reputation points Microsoft External Staff Moderator

Hi @Fernando Rojo
Thankyou for reaching microsoft Q&A!
Root Cause Analysis (RCA):

We observed a significant increase in ingestion-to-storage latency for Application Insights telemetry in the West US 2 region. While events were being received by Application Insights, they were not being processed and forwarded to Azure Data Explorer (ADE).

The underlying cause was a data center outage in the West US 2 region. This outage impacted virtual machines hosting Azure Database for PostgreSQL flexible servers, rendering them unhealthy and inaccessible. As a result:

Critical workflows such as major version upgrades were disrupted, leading to stuck operations.

Automation was unable to handle unmounted data drives, requiring manual intervention.

Physical network and Azure Resource Manager services were also affected, confirming that the issue stemmed from external infrastructure failure rather than code or configuration changes.

Because ingestion pipelines rely on healthy PostgreSQL flexible servers and supporting infrastructure, the outage directly caused delays in telemetry processing and storage.

Resolution: Manual intervention was performed to recover unmounted drives and restore impacted services. Once infrastructure health was re-established, ingestion latency returned to normal levels.

Preventive Actions:

  • Improvements to automation for handling unmounted drives.
  • Enhanced resiliency measures for dependent infrastructure in West US 2.
  1. Pilladi Padma Sai Manisha 10,190 Reputation points Microsoft External Staff Moderator

    Hi @Fernando Rojo
    I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.


Sign in to comment

Answer accepted by question author

Alex Burlachenko 22,120 Reputation points MVP Volunteer Moderator

hello Fernando Rojo & thanks for join me here at Q&A portal,

this sounds service-side if events are received by Application Insights but delayed before storage or ADX export. First confirm whether the delay is inside Azure Monitor ingestion or in the ADX ingestion path.

Run an ingestion delay query in the Application Insights workspace using ingestion_time() versus the event timestamp. Microsoft documents this method here: https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-ingestion-time

If AI ingestion itself is delayed in West US 2, look at Azure Service Health and open Azure Monitor support ticket with workspace or App Insights resource ID, region, UTC window, and sample operation IDs. If AI data appears in Log Analytics on time but ADX is delayed, check ADX ingestion metrics: EventsReceived, IngestionLatencyInSeconds, failed ingestion count, batching policy, and data connection health. ADX monitoring https://learn.microsoft.com/en-us/azure/data-explorer/monitor-data-explorer Given this is a sudden regional latency increase, I would not spend too long tuning the app. Capture timestamps and prove where the delay sits app emit time > AI ingestion time > ADX ingestion time. Then escalate to Azure Monitor or ADX depending on which hop is slow.

rgds,

Alex

&

If my answer was helpful pls mark it and additional thx if u follow me at Q&A portal
  1. Fernando Rojo 40 Reputation points Microsoft Employee

    Executing this query:

    union *
    | where timestamp > ago(4h)
    | project timestamp, TimeReceived = _TimeReceived, IngestionTime = ingestion_time()
    | extend ClientToIngestionLatency = TimeReceived - timestamp
    | extend IngestionToStorageLatency = IngestionTime - TimeReceived
    | extend TotalLatency = IngestionTime - timestamp
    | summarize avg(ClientToIngestionLatency), avg(IngestionToStorageLatency), avg(TotalLatency) by bin(timestamp,5m)
    

    The delay is fully in Ingestion To storage, and it does appear to be impacting all of our subscriptions in west us2 across the board. It seems most likely related to the outage Active - Multiple services degradation in West US 2

    Thank you for the help, i will reach out to support with the relevant details.


Sign in to comment

1 additional answer

  1. Fernando Rojo 40 Reputation points Microsoft Employee

    Executing this query:

    kql

    union *
    | where timestamp > ago(4h)
    | project timestamp, TimeReceived = _TimeReceived, IngestionTime = ingestion_time()
    | extend ClientToIngestionLatency = TimeReceived - timestamp
    | extend IngestionToStorageLatency = IngestionTime - TimeReceived
    | extend TotalLatency = IngestionTime - timestamp
    | summarize avg(ClientToIngestionLatency), avg(IngestionToStorageLatency), avg(TotalLatency) by bin(timestamp,5m)
    

    The delay is fully in Ingestion To storage, and it does appear to be impacting all of our subscriptions in west us2 across the board. It seems most likely related to the outage Active - Multiple services degradation in West US 2, however we are unable to confirm where in the flow the delay is stemming from

    0 comments No comments

    Sign in to comment
Sign in to answer

Your answer