VOOZH about

URL: https://thenewstack.io/network-observability-in-k8s-clusters-for-better-troubleshooting/

⇱ Network Observability in K8s Clusters for Better Troubleshooting - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-17 07:00:40
Network Observability in K8s Clusters for Better Troubleshooting
sponsor-tigera,sponsored-post-contributed,
Kubernetes / Networking / Observability

Network Observability in K8s Clusters for Better Troubleshooting

Calico empowers DevOps and platform teams to achieve observability and efficient debugging for their container and Kubernetes environments.
May 17th, 2024 7:00am by Dhiraj Sehgal
👁 Featued image for: Network Observability in K8s Clusters for Better Troubleshooting
Image from Vadim Sadovski on Shutterstock.
Tigera sponsored this post. Insight Partners is an investor in Tigera and TNS.

For DevOps and platform teams working with containers and Kubernetes, reducing downtime and improving security posture is crucial. A clear understanding of network topology, service interactions and workload dependencies is required in cloud native applications. This is essential for securing and optimizing your Kubernetes deployment and minimizing response time in the event of failure.

Network observability can highlight gaps in network policies for applications that require network policy controls, thus reducing the risk of attack from unsecured egress access or lateral movement of threats within the Kubernetes cluster. However, visualizing workload communication, service dependencies, and active and inactive network security policies presents significant challenges due to the distributed and dynamic nature of Kubernetes workloads.

Network Observability Is Difficult With K8s Workloads

Kubernetes scales up and scales out pods and creates and destroys services depending on real-time business requirements, resulting in dynamic network connections for each workload instance. Network access policies defined for each workload further affect these connections.

In such scenarios, capturing an accurate and up-to-date representation of network traffic, service dependencies and network policies is difficult. The default Kubernetes implementation provides limited network traffic visibility and policy information, making it challenging for teams to troubleshoot connectivity issues, improve security and demonstrate compliance.

Limitations of General-Purpose Observability Tools

DevOps and platform teams often rely on general-purpose observability tools to gain visibility into workload communication and network policies.

Network Observability for Secure Communication

In terms of security, DevOps and platform teams often report that general-purpose observability solutions don’t effectively monitor communications between workloads and into or out of the cluster.

Kubernetes network and security policies determine access in the cluster. Real-time mapping of these policies to traffic flow in the Kubernetes cluster is critical to understanding a deployment’s behavior.

Due to the dynamic and ephemeral nature of Kubernetes, traditional monitoring tools are unable to map policies and flows that can scale with the application. This leads to challenges in developing, implementing and validating effective network policies during runtime.

Data Aggregation and Correlation

Kubernetes creates a large number of ephemeral objects that generate data across a distributed environment. This data needs to be aggregated and correlated to visualize the interactions and activities in the environment. Furthermore, Kubernetes context such as pods, services and namespaces must be added to the data, which requires time as well as resources such as extra compute, memory and storage.

Kubernetes Context

Kubernetes adds a layer of abstraction on top of hosts and VMs. While collecting and aggregating data from individual containers and hosts is important, the data must be correlated and aggregated at different levels of Kubernetes abstractions.

👁 Image

Most general-purpose observability tools export data from Kubernetes clusters and use extensive computing resources to aggregate and correlate this data. This is costly and limited in functionality. For Kubernetes network observability, it’s critical that the observability tooling is native to Kubernetes and operates inside the cluster.

Kubernetes-Native Network Observability

The default setup of Kubernetes provides restricted insights into visibility and policy information, often requiring users to compile data from multiple sources to obtain a comprehensive view.

Commonly, one would execute various kubectl commands to gather siloed information across the Kubernetes stack. For instance, running `kubectl get pods` helps retrieve a list of all running pods within a cluster, whereas `kubectl get networkpolicies` displays all the NetworkPolicy resources that are defined. Gaining visibility into traffic and policies using kubectl commands is notably cumbersome and inefficient in a distributed Kubernetes environment.

Additionally, visibility into infrastructure metrics like network flows and DNS logs can be achieved through open source monitoring tools such as Prometheus and Grafana, which help track both encrypted and unencrypted data.

General-purpose monitoring solutions typically gather metrics at the node, container or pod levels, which leads to isolated data silos. These silos then require complex aggregation and correlation at the application and microservices levels to effectively monitor and troubleshoot issues like application behavior, performance bottlenecks and communication problems. Teams utilizing this method struggle with scalability due to the vast amount of granular data generated and the transient nature of interactions within the dynamic infrastructure of Kubernetes.

For more detailed analysis, third-party monitoring tools like Datadog, Dynatrace and Splunk are often used to collect logs and metrics and to build comprehensive dashboards. Moreover, using prebuilt dashboards provided by managed service providers can offer a streamlined way to track and analyze statistical data, facilitating better operational oversight and strategic planning within the Kubernetes environment.

Kubernetes Network Observability With Calico

Calico Cloud provides Kubernetes-native, purpose-built observability and troubleshooting for Kubernetes environments, enhancing the ability to quickly resolve connectivity issues, strengthen security postures and understand network topologies in real time.

Network Metrics

Calico automatically gathers logs from various activities within the Kubernetes cluster across the stack, such as DNS flows, application flows, microservice information, Kubernetes activity, audit logs, network flows, TCP/UDP status, socket stats and process information. It also records data on various network policies applied within the clusters, such as application-level, network-level and DNS policies. Calico combines these data points at the source, and is thus enriched with Kubernetes-specific metadata without any additional configuration required, thereby saving time and effort, as well as resources such as memory, compute and network bandwidth.

👁 Image

Visualizations

Calico Cloud offers a detailed dashboard for easy monitoring of traffic flow and network policies and troubleshooting networking and network security issues with Dynamic Service Threat Graph. It also provides custom dashboards such as the DNS Dashboard for in-depth insights into application networking and security. Additionally, Calico features advanced log management with automated filtering, and prebuilt tabs to streamline troubleshooting and perform faster root-cause analysis. Calico provides a straightforward process to identify problematic workloads and quickly access relevant logs, significantly simplifying the troubleshooting process.

👁 Image

For users seeking deeper analysis such as DNS analysis, Calico’s built-in integration with Kibana allows for the creation of detailed and custom queries, catering to more advanced needs.

👁 Image

Troubleshooting Tools

Calico provides tools to troubleshoot network connectivity issues. Consider a scenario where dashboard alerts identify a communication breakdown or a policy denying traffic. In the figure below, DevOps and platform engineers can troubleshoot why the “default” pod is not communicating with kube-system in just a few clicks. A user navigates to the service graph, right-clicks on the pod, enables packet capture with specific timestamps and protocols, and captures all traffic to do root-cause analysis. The captured data is already aggregated and correlated, and points to specific configurations, dependencies or policies for breakdown. By selecting the affected workloads, the user can immediately see what is causing the network breakdown, including network policies causing the problem.

👁 Image

Benefits of Using Calico

  • Faster troubleshooting: By offering a real-time view of application traffic and correlated data, Calico enables DevOps teams to quickly narrow down troubleshooting efforts, from misconfigured network policies to networking performance issues. This streamlined approach allows teams to efficiently address security gaps and workload communication issues, thereby reducing downtime and boosting operational efficiency.
  • Improved security posture: DevOps teams can now pinpoint security gaps and address the lack of granular workload access controls using Calico. With activity-based visualizations and detailed traffic metadata, Calico enables teams to preview and recommend policies before enforcement. This enhances an application’s security posture and effectively mitigates risks.

Conclusion

Calico empowers DevOps and platform teams to achieve observability and efficient troubleshooting for their container and Kubernetes environments. By providing a purpose-built solution that addresses the limitations of current approaches, Calico enables teams to reduce downtime, improve security posture and enhance operational efficiency. With Calico, DevOps and platform teams can confidently navigate the complexities of container and Kubernetes environments, and drive innovation with peace of mind.

Tigera provides Calico, a unified network security and observability platform to prevent, detect and mitigate security breaches in Kubernetes clusters. Tigera’s open-source offering, Calico Open Source, is the most widely adopted container networking and security solution.
Learn More
The latest from Tigera
TRENDING STORIES
Dhiraj Sehgal is director of Product, Technical and Partner Marketing at Tigera. His expertise lies in effectively communicating cloud native and SaaS technology to customers, and he is knowledgeable on a wide range of topics including security, networking, storage, the...
Read more from Dhiraj Sehgal
Tigera sponsored this post. Insight Partners is an investor in Tigera and TNS.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.