![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Cloud native has quickly become the preferred path to digital transformation, but it doesn’t come without added complexity and cost. Unlike virtual machine (VM)-based infrastructure, cloud native environments with Kubernetes are always changing; they encompass thousands of containers and microservices, which emit greater amounts of data and have more interdependencies.
To address such challenges, engineering teams can use cloud native observability platforms with Kubernetes management and orchestration layers to accomplish digital business goals faster, protect revenue and support innovation.
If your organization wants to run a cloud native observability and managed Kubernetes solution, here are 10 best practices to follow for Kubernetes observability.
Following these 10 simple steps can help you get or take back control of your observability data:
First, establish your digital transformation initiative vision and set goals to achieve it. If it’s an application, for example, that allows a hybrid workforce to connect with customers, what should your service-level objectives (SLOs) be?
On the flip side, what is the target mean time to remediate (MTTR), or how much downtime can the organization afford? What kind of resource spike should be allowed and how much funding can be invested should all be defined from the start. Working backward from here, you can determine what your mean time to detect (MTTD) needs to be in order to meet your goals.
As with every digital transformation project, your team needs the best solution or combination of solutions possible, which will depend on your use case(s) and goals. There is not one way to monitor Kubernetes or to be cloud native. It depends on your people, your organization, your business goals and your existing technology stack.
The types of tools to consider when choosing the optimal observability solution for your organization are:
Once you determine the solution(s) you need, it’s time to decide how to take advantage of them. Open source is a critical characteristic of any cloud native ecosystem collector, especially when you rely on Kubernetes.
These are some of the key ways to deploy and access observability:
The final step in selecting a solution is choosing a cloud provider tool. For a single cloud, using the cloud provider’s analytics and monitoring is smart because you gain a price advantage and visibility from deep integration with existing cloud infrastructure. Whether you use a single cloud or multiple clouds, you are responsible for the customer experience.
You need to instrument your code to get the most out of the tool(s) you’re using and to enable distributed tracing (see No. 7). In practice, instrumenting your code means collecting data and then sending it wherever you want — no more vendor lock-in like application performance monitoring (APM) or infrastructure monitoring providers required. Many solutions work out of the box without much work, but you get the most and best available data to take the best course of action by instrumenting your code.
In the open source world, Prometheus is the standard to understand Kubernetes cluster health. However, be cautious because you might not actually need all of the data that is emitted. If the data isn’t useful to you and your organizations, it becomes costly. Adapting for a specific use case and business need is always better than a one-size-fits-all monitoring experience. Be aware of this if you are learning with Prometheus dashboards.
Your engineers will be tasked with creating dashboards that deliver data visualization at the ready. That way, you can quickly glance in and understand exactly what’s happening in your system. Many solutions include a dashboard system. For example, Chronosphere helps you experience faster dashboards with Query Accelerator technology. Across your fleet, it’s fast and performant and requires no manual optimizations.
This approach is simpler because your engineers don’t need to be deep experts in a query language (such as PromQL), the architecture and scale of the environment, the observability solution’s underlying data model or how a query in testing will perform in production.
Significant resource changes can mean good news or bad — your customer base has suddenly spiked or something has stopped working. Either way, it can be challenging with existing APM or infrastructure monitoring tools to understand how much of a resource you’re using, what resource it is, for what application and whether it’s using too much of your resources.
The Observability Data Optimization Cycle from Chronosphere helps your organization overcome these challenges by better understanding and taking action on the cost of your observability data through a process consisting of analyzing, refining and operating.
Logging is important in the cloud native world because it helps your team capture, aggregate and understand system events. In a cloud native architecture, the number of incidents increases, but so does the amount of logs that are not correlated in a single system. This makes it difficult to find the data you need and troubleshoot the problem. While metrics are an important tool to diagnose the symptom of an issue, you use traces to locate the problem, and logs are best suited to uncover the root cause of the issue.
To keep logs under control in a Kubernetes environment, you need to be able to aggregate and filter data to reduce waste, save money and make it easier to locate the data you need in a timely manner.
If you don’t instrument code properly (see No. 2), you can’t support distributed tracing. Yet distributed tracing allows you to see what a request did throughout the system. It is the way you determine that a single function is taking a very long time so you can dive deeper into why — preferably before it hurts customer experience.
After completing steps 1–7, a best practice is setting up alerts and notifications to send to yourself or your team. That way, if and when something goes wrong, someone can triage and fix the issue.
This step is common sense. Future-proofing is hard with new updates coming out almost daily. Keep up with solution patching and observability best practices. Add automation when you can to eliminate time-consuming and error-prone manual processes.
The best observability platforms will help you control your cloud costs and observability spend. Choosing a solution such as Chronosphere with its Control Plane that gives you different tools along the observability pipeline allows your organization to:
This type of transparency allows valuable, talented engineers to work on projects that are more impactful to the business. With cost controls in place, you can then begin to fine-tune and see how useful data is; set quotas for teams based on observability spend and perform cost accounting trend analyses across teams running independent microservices.
Cloud native environments are essential for businesses that want to harness the power of digital transformation, but it’s necessary to have the right tools that can work together and best practices in place. Chronosphere and its partners are built from the ground up to abstract away the complexities of cloud native environments, optimize data and reduce engineering burnout.