![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
IT systems are constantly under threat — malicious or not — so much that breaches have become almost commonplace. Case in point, as we started writing this article, news broke that 4.5 million people were affected by the recent HealthEquity data breach.
As our cloud native systems continue to scale, their distributed nature also makes them more complex. This complexity affords us flexibility and velocity; it also exposes more points of failure and intrusion.
Falling prey to human error, poorly written code or an intentional breach isn’t just about the immediate business impact. Companies are at risk of government scrutiny, billions of dollars in fines or even legal action if they can’t recover quickly.
So while the recent CrowdStrike fiasco certainly made headlines, it’s the aftermath that matters. It also has us giving mean time to recovery (MTTR) a second look, specifically how you can reduce the amount of time it takes to recover from an outage or malicious attack. As the DevOps Research and Assessment (DORA) team defines it, MTTR is “the average amount of time it takes your team to restore service when there’s a service disruption, like an outage.”
Before you change your technology approach, you must change your organization’s mindset. Start by making security inherent to your software development life cycle (SDLC), from code to production to management. It’s much harder to change behaviors than adopt a new tool or platform, and without this culture shift, it won’t matter what technology choices you make.
Your platform choice matters to your security posture. Look for security-enhancing features and capabilities that support a DevSecOps-based working model. Increase the skills of your current employees on newer disciplines like platform engineering and architecting for compliance.
Blue-green deployment is a technique that can reduce app downtime and risk by running two identical production environments, one “blue” and one “green,” where only one of the environments is live and serving production traffic, and the other is idle. Only after proper testing can the idle environment start serving production workloads.
Canary deployment is another way to test the viability of new software or updates in production. You send certain bits of the new software or update to production and see how they run. If things are smooth, you release more parts. It’s part of another modern app delivery paradigm called progressive delivery, coined by RedMonk’s James Governor several years ago. What blue-green and canary deployments have in common is they allow you to easily roll back to a known-good version of the software with minimal disruption if something breaks.
Test-driven development (TDD) is critical to continuously releasing stable and resilient applications. To get the most from TDD, don’t just do functional tests on what you added. You need to test the new piece in context of everything else, so be sure to include regular fuzz, chaos or fault testing in your approach.
Error-handling and -monitoring combined with robust log monitoring and observability can capture problems as they happen and limit the scope of a failure.
Policy-based automation can improve multiple aspects of your software delivery and maintenance processes. To safely automate the multiple layers of your security, get input from various teams, including platform engineering security, compliance, and infrastructure and operations (I&O) teams. This will help make the policies that define your automation process more holistic to mitigate a disastrous outage or lessen the damage.
Before VMware Tanzu introduced the four golden commands (build, bind, deploy and scale), there were the 3Rs (rotate, repave and repair). They provided a simple way of looking at a cloud native platform’s security attributes. The idea behind the 3Rs is that by being fast, you are safer.
The 3Rs continue to be core tenants of the Tanzu Platform, and you can follow our blog to learn more about Tanzu and security.
There are multiple factors involved in making sure that recovering from an outage or security breach is not devastating to application development and delivery processes including platform choice, development styles (e.g., agile, extreme, test-driven) and organizational or cultural factors.
Rather than treating security as a single outcome, focus on delivering secure software supply chains; support a security-focused culture; automate patches, upgrades and policy enforcement; stay on top of policy drift and monitoring; and employ other security-enabling outcomes.