VOOZH about

URL: https://thenewstack.io/chaos-under-control-addressing-cloud-infrastructure-drift/

⇱ How to Control Cloud Infrastructure Drift - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-11-19 06:33:30
How to Control Cloud Infrastructure Drift
sponsor-firefly,sponsored-post-contributed,
DevOps / Infrastructure as Code

How to Control Cloud Infrastructure Drift

Infrastructure drift is more than just a technical nuisance; it’s a pervasive problem that — left unchecked — can compromise your entire organization.
Nov 19th, 2024 6:33am by Eran Bibi
👁 Featued image for: How to Control Cloud Infrastructure Drift
Featured image by Matthew Valentino on Unsplash.
Firefly sponsored this post.

Infrastructure drift is a pervasive challenge for organizations managing cloud resources at scale. While Infrastructure as Code (IaC) offers a structured approach to deploying and maintaining infrastructure, drift still occurs when changes happen outside IaC workflows. And this isn’t necessarily anomalous behavior — this can happen at any given time due to an external contractor, during a high-pressure situation (such as an incident) that requires quick resolution, or due to a lapse in judgment or an overly privileged tool.

While we always aspire to maintain perfect IaC hygiene with flawless GitOps processes, unfortunately this is pretty much wishful thinking and impossible to enforce. In practice, we see an overreliance on ClickOps (or the manual execution of tasks by clicking through various options within software tools, which can be more accessible for users who may not be familiar with coding or scripting). And that manual process can often be the cause of infrastructure drift.

Infrastructure drift refers to the divergence between the actual state of infrastructure in the cloud and the desired state defined in IaC tools like Terraform. This discrepancy can lead to security vulnerabilities, reliability issues and operational inefficiencies.

At Firefly, we scan and process more than 55,000 cloud accounts through our system daily. In that, we process almost 320,000 drifts per month, so we really understand the sheer magnitude and implications of the infrastructure drift problem. We’ve also seen that 90% of large-scale deployments using IaC experience drift, and about half of those cases go unnoticed. For those organizations, there’s a 100% chance of negative impact, whether it’s on reliability, security or toil.

Common Causes of Infrastructure Drift

There are many reasons infrastructure drift is so common, despite growing understanding that it needs to be mitigated. Many of the causes result from everyday maintenance of large-scale cloud infrastructure and high-velocity and high-pressure delivery cycles.

Common reasons infrastructure drift occurs include:

  • Manual emergency fixes: During incidents or emergencies, engineers often make direct changes to infrastructure through cloud consoles or APIs. These changes can address immediate issues but may bypass IaC pipelines, leading to drift.
  • Legacy resources: Organizations that adopt IaC midstream may have existing resources that were created manually or with different tools. These unmanaged resources are prone to drift as they fall outside IaC governance.
  • Automated tools with permissions: Tools like cloud security posture management (CSPM) may have permissions to modify configurations, such as security groups. When these tools make changes outside of IaC workflows, drift is introduced.
  • Partial IaC adoption: Some organizations implement IaC selectively, managing only new or specific projects with IaC while older or different resources are managed manually. This inconsistency can result in drift across environments.
  • Environment misalignment: Although production environments are often tightly controlled, staging and development environments may allow more flexibility for developers. Manual changes in these environments can create discrepancies, especially if configurations don’t match across environments.
  • IaC and cloud API misalignment: Cloud providers frequently update their APIs and services, which can lead to drift if IaC tools aren’t updated to match. This misalignment can cause IaC deployments to diverge from the current cloud state.

Manual emergency fixes are unavoidable for even the most evolved engineering organizations. Yet, while these changes may address immediate issues, they bypass IaC pipelines, leading to discrepancies. Additionally, organizations that adopt IaC partway through their cloud journey may have legacy resources created outside IaC governance, making them prone to drift. Automated tools, such as CSPM systems, may have permissions to modify configurations such as security groups; changes made by these tools outside of IaC workflows can introduce further discrepancies.

What Infrastructure Drift Looks Like

Infrastructure drift can take many forms, often beginning with minor changes that snowball into significant discrepancies.

For instance, consider an AWS identity and access management (IAM) policy managed through Terraform, where a drift occurs when someone adds something as simple as an asterisk (*) to a policy, which expands permissions from read-only to full access. Similarly, in a Kubernetes environment, a role with read-only permissions in IaC might be modified to include write and delete permissions in the actual cluster — which can potentially cause a lot of production damage. These seemingly small adjustments can compromise security and lead to unintended access.

When drift goes unchecked, it can pose risks beyond minor inconveniences.

Data from our 2024 State of Infrastructure as Code Report shows that it is often going unchecked. Not only is infrastructure drift frequently flying under the radar undetected, even when it is detected, it’s not getting remediated right away. Worryingly, 13% of the time, infrastructure drift isn’t fixed at all.

👁 Data from " 2024 State of Infrastructure as Code Report" shows over 1/3 are spending days or weeks remediating drift. Less than half can do it in less than a day.

Beyond just the major risk of downtime, unaddressed drift can impact the stability and security of your infrastructure. For example, when permissions or configurations change outside IaC, it can open vulnerabilities that attackers might exploit. Drift can also affect service reliability if the infrastructure’s actual state doesn’t match the desired configurations tested in staging. All in all, drift is more than a just technical nuisance, and it can compromise your organization as a whole.

First: Practical Approaches to Proactive Drift Detection

Managing drift effectively requires robust monitoring and detection, as well as tried-and-true methods to mitigate it as quickly as possible.

Below are some handy tips for detecting and managing drift:

  • Drift monitoring: Terraform’s plan or Pulumi’s preview command can be used to detect drift, as can running AWS CloudFormation’s drift detection command via the command-line interface (CLI). By scheduling regular checks, teams can compare the current infrastructure state with the desired configuration. If drift is detected, an exit code will indicate a discrepancy, enabling teams to respond accordingly.
  • GitOps for Kubernetes: For Kubernetes environments, GitOps tools like Argo CD and Flux continuously reconcile the cluster state with the configuration stored in Git. These tools help ensure that any unauthorized changes are quickly reverted, maintaining alignment with the source of truth in Git.
  • Drift detection tools: Open source tools like Driftctl and KubeDiff provide targeted drift detection capabilities. Driftctl works well with IaC tools like Terraform, while KubeDiff is optimized for Kubernetes configurations.
  • Real-time alerts and routing: Establishing alerting mechanisms is crucial for effective drift management. By integrating IaC tools with Slack or PagerDuty, teams can receive real-time notifications of drift, enabling prompt resolution.

These are good ways to detect drift, but the goal must be remediating the drift.

Next: Strategies for Drift Remediation

Remediating drift can take two main forms: aligning the cloud environment with IaC or updating IaC to reflect the actual state. In cases where manual changes are temporary fixes, reapplying IaC configurations can restore the desired state. However, if manual changes represent necessary adjustments, it’s best to update the IaC templates to align with the actual state, preventing recurring drift.

If you’re just starting out with drift detection, a simple monitoring script using Terraform can provide valuable insights into discrepancies. Although this basic approach may not scale for large deployments, it can be effective for smaller setups or as a proof of concept. For larger environments, tools like Firefly, driftctl or GitOps frameworks provide a more robust solution for handling the complexity of enterprise-scale infrastructures.

Getting Infrastructure Drift Under Control

Infrastructure drift is an ongoing challenge in cloud environments, but with the right tools and practices, organizations can maintain control over their infrastructure.

By leveraging IaC, monitoring drift proactively and implementing strategies like GitOps, teams can minimize the impact of drift, ensuring infrastructure remains consistent and aligned with organizational needs. Regular drift detection and timely remediation ultimately improve the security, reliability and efficiency of cloud operations, empowering teams to deliver with confidence at the velocity modern companies require.

Firefly is a Cloud Control Plane that enables DevOps and Platform Engineering teams to scan and discover their entire cloud footprint, detect cloud configuration drifts, classify assets using Policy-as-Code, and manage a single inventory of cloud resources across Multi-Cloud and Kubernetes clusters.
Learn More
The latest from Firefly
TRENDING STORIES
Eran Bibi is co-founder and chief product officer at Firefly. With years of experience in anything DevOps/SRE and security, he has earned a reputation as a CI/CD and SRE expert and an avid admin of cloud platforms and containerized environments....
Read more from Eran Bibi
Firefly sponsored this post.
SHARE THIS STORY
TRENDING STORIES
AWS and PagerDuty are also sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.