Answer accepted by question author
Hi @Fernando Rojo
Thankyou for reaching microsoft Q&A!
Root Cause Analysis (RCA):
We observed a significant increase in ingestion-to-storage latency for Application Insights telemetry in the West US 2 region. While events were being received by Application Insights, they were not being processed and forwarded to Azure Data Explorer (ADE).
The underlying cause was a data center outage in the West US 2 region. This outage impacted virtual machines hosting Azure Database for PostgreSQL flexible servers, rendering them unhealthy and inaccessible. As a result:
Critical workflows such as major version upgrades were disrupted, leading to stuck operations.
Automation was unable to handle unmounted data drives, requiring manual intervention.
Physical network and Azure Resource Manager services were also affected, confirming that the issue stemmed from external infrastructure failure rather than code or configuration changes.
Because ingestion pipelines rely on healthy PostgreSQL flexible servers and supporting infrastructure, the outage directly caused delays in telemetry processing and storage.
Resolution: Manual intervention was performed to recover unmounted drives and restore impacted services. Once infrastructure health was re-established, ingestion latency returned to normal levels.
Preventive Actions:
- Improvements to automation for handling unmounted drives.
- Enhanced resiliency measures for dependent infrastructure in West US 2.
-
Pilladi Padma Sai Manisha 10,190 Reputation points • Microsoft External Staff • Moderator
Hi @Fernando Rojo
I hope you had a chance to review the information shared earlier, and I hope this information has been helpful! If you still have questions, please let us know what is needed in the comments so the question can be answered.
Sign in to comment
