Kubernetes is an infrastructure platform commonly used in modern IT environments. It supports cloud-native applications and offers many benefits but also introduces complexities and operational challenges. Operating Kubernetes reliably requires detecting errors and fixing them quickly. You can only achieve this by establishing visibility into your containerized environments.
Monitoring Kubernetes provides access to operational metrics, helping you gain insight into your workload’s state. You can monitor specific performance metrics, overall cluster health, and resource utilization. These metrics offer insights to help you discover and troubleshoot issues and detect threats and protect your workloads timely.
The Importance of Kubernetes Metrics
Kubernetes applications are highly complex, especially those based on a microservices architecture. As a result, it is often difficult to identify the root cause of the issues. Kubernetes metrics can help you introduce visibility to identify the root cause and effectively remediate problems.
You can leverage Kubernetes metrics to minimize security blindspots that allow threat actors to attack container vulnerabilities and misconfigurations. Threat actors can use these misconfigurations to discover vulnerable spots in systems, files, process controls, and networks. You can also apply layer seven-network filtering to detect attempts to use compromised containers to access other pods.
Kubernetes Monitoring Challenges
Kubernetes has so many moving parts that it can be difficult to find the source of a problem. Kubernetes clusters tend to have more servers and services than traditional environments, so there are more logs and other areas to investigate when something goes wrong.
In a traditional monolithic environment, you might have one or two logs to search for, but a microservice might have many more (one or more logs for each microservice related to the problem you are solving). Sifting logs from multiple services is time consuming and often doesn’t help you find the real root cause of a problem.
Also, while previously there were only a handful of servers and services involved in a single transaction, Kubernetes often has many more components involved. A trace header is usually added to each transaction to determine which microservice to investigate. This makes it easy to discover which microservices were involved and ultimately failed – unfortunately, adding these headers requires a code change, and even if you know which service failed, you still have to use the logs to find out why.
Tools and Technologies That Can Help Analyze Kubernetes Data
Kubernetes Dashboard
The Kubernetes Dashboard is a web-based user interface for Kubernetes. You can use it to deploy containerized applications to Kubernetes clusters, troubleshoot containerized applications, and manage cluster resources.
You can use the Kubernetes Dashboard to get an overview of the applications running on your cluster and create or modify individual Kubernetes resources, including Deployments, Jobs, and Daemons. For example, you can use the deployment wizard to scale your deployment, initiate rolling updates, restart pods, or deploy new applications.
The dashboard also provides information about the health of your cluster’s Kubernetes resources and any errors that may have occurred.
SIEM
Visibility is critical to keep your production environment secure. A security information and event management (SIEM) system helps centrally manage Kubernetes audit logs, to help identify important security events while reducing noise.
You can enhance security using Kubernetes audit events by:
- Making sure you are building approved container images
- Making sure APIs are not exposed externally
- Monitoring outbound and inbound traffic to clusters and pods
- Using container log data for tracking and visualization
Istio
Istio is an independent, open-source service mesh technology that enables developers to connect, secure, control, monitor, and run distributed microservices architectures, most commonly deployed on Kubernetes, regardless of platform, origin, or vendor.
Istio generates detailed telemetry data for every service communication within the mesh. This telemetry provides observability of service behavior and allows operators to troubleshoot, maintain, and optimize applications without placing an additional burden on service developers. Istio gives operators full visibility into how the monitored service interacts with other services, not just the Istio component itself.
To monitor service behavior, Istio generates metrics for all service traffic in and out of the Istio service mesh. These metrics provide behavioral information such as overall traffic, traffic error rate, and request response time.
In addition to monitoring the behavior of services within the mesh, it is also important to monitor the behavior of the mesh itself. Istio components emit their own internal working metrics to gain insight into the health and functioning of the mesh control plane.
Prometheus
Prometheus is an open-source monitoring and alerting toolkit for microservices and containers that provides flexible querying and real-time notifications. Prometheus helps IT departments monitor and recognize problems in application programming interfaces (APIs) and other connected applications and services.
These four features make Prometheus the de facto standard for Kubernetes monitoring.
- Multidimensional data model: Based on key-value pairs, similar to how Kubernetes uses labels to organize infrastructure metadata. It supports flexible and accurate time series data and supports the Prometheus query language.
- Accessible formats and protocols: Publishing Prometheus metrics is a very simple task. It uses a standard HTTP transport to expose metrics in a human-readable and descriptive format.
- Service discovery: Prometheus servers fetch targets periodically (metrics are pulled instead of pushed) so applications and services don’t have to worry about sending data. Prometheus servers have several ways to automatically detect their targets.
- Modular high-availability components: Metric collection, alerting, graph visualization, etc. are performed by various configurable services. All of these services are designed to support redundancy and sharding.
Conclusion
In this article I explained the challenge of Kubernetes observability, and covered four technologies that can help make sense of Kubernetes operational data:
- Kubernetes Dashboard: Shipped with the core Kubernetes distribution, provides insights about activities in a Kubernetes cluster
- SIEM: Helps centrally manage Kubernetes audit logs, to help identify important security events while reducing noise
- Prometheus: The de facto standard for Kubernetes monitoring
- Istio: Generates detailed telemetry data for every service communication within a Kubernetes cluster
I hope this will be useful as you help your organization derive insights from Kubernetes operational data.