How to Learn Kubernetes – The Best Tutorials, Comics, and Guides

Want to learn Kubernetes but don’t know where to start? I was in the same boat, and after reading numerous Kubernetes tutorials, I realized there had to be an easier way for beginners to learn.

Having recently joined LogDNA, I’m going to share what I learned in the past two weeks, as well as the best resources I found – comics, guides, tutorials – all so you can go from zero to Kubernetes expert in no time.

Full disclosure, I started out building php and wordpress sites long ago, and I am mostly familiar with “ancient” self hosted monolithic applications. I’m a newbie when it comes to containers and container orchestration.

Surprisingly, of all resources, it was through a series of comics that I found my bearings of what Docker containers and Kubernetes is. The first and best was created by Google: Smooth Sailing with Kubernetes. I highly recommend reading this first.

What Are Containers?

The most familiar reference that I have to compare containers to are Virtual Machines. Virtual Machines offer isolation at the hardware level, where a server offers guest operating systems to run on top of it. Not only VMWare but Microsoft Hyper-V, Citrix XenServer, and Oracle VirtualBox all with Hypervisors that enabled hardware virtualization and let each guest OS have access to CPU, memory and other resources. The limitations with VM was that each guest operating system had a large footprint of its own full OS image, memory and storage requirements. So scaling them was costly and the applications weren’t portable between hosting providers, and private/public clouds.

Containers sit on top of the physical hardware and its host operating system to share the host OS’ kernel, binaries, and libraries. This way they’re much more light weight with image sizes in the megabytes instead of gigabytes and can start up in seconds rather than minutes. Along with obvious performance benefits, they also reduce management overhead of updating and patching full operating systems and they’re much more portable to different hosting providers. Multiple containers can run on a single host OS which saves operating costs.

Consolia’s comic offers a great visualization of the difference between VM and Containers: https://consolia-comic.com/comics/containers-and-docker

And xkcd offers offers some humor on containers.

Docker wanted to allow developers to create, deploy and run applications through the use of containers. With Docker containers, they focused on faster deployment speeds, application portability, and reuse. Docker does not create an entire virtual operating system and will require that the components that are not already on the host OS to be packaged inside the container. Applications will be packaged up with exactly what they need to run, no more and no less.

Some interesting information on Docker containers and how incredibly quickly they’ve been adopted around the world:

  • It was only released in 2014
  • Over 3.5M applications have been placed in containers using Docker
  • 37B containerized apps have been downloaded so far

What is Kubernetes?

This Illustrated Children’s Guide to Kubernetes is really good at explaining the need for Kubernetes.

Google created and open sourced the Kubernetes orchestration system for docker containers. It is used to automate the deployment, scaling and management of containerized applications.

Google runs over 2 billions containers per week and built Kubernetes to be able to do so at worldwide scale.The idea is to provide tools for running distributed systems in production like load balancing, rolling updates, resource monitoring, naming and discovery, auto-scaling, mounting storage systems, replicating application instances, checking the health of applications to log access, and support for introspection.

Kubernetes allows developers to create and manage clusters and scale them. Brian Dorsey’s talk at GOTO 2015 helped get from concept to seeing a real life example of deployment, replication and updates within an hour.

Side note, I’m currently reading Google’s SRE book and awe-struck by their story of scaling up.

How to Get Started

I started with the official docs of Kubernetes Basics And then followed this Hello World Mini Kubes tutorial. I’m just making my way through this excellent tutorial now called Kubernetes the Hard Way.

So far, now that I have a basic understanding of Containers and Kubernetes, I’m really excited at all its possibilities. It’s incredible to see that such powerful tools are available to all, both for developers to do CI/CD and especially for devops to scale, maintain 24/7 availability regardless of what happens and get a good night’s rest without alerts of catastrophic failures.

I’m also realizing that using Kubernetes also requires the use of other additional tools like Helm for managing Kubernetes packages and expedites the need for centralized logging and monitoring tools. It’s not as simple as logging into a server to look at a log file anymore when you’re dealing with many replicas and nodes that start and stop on the fly. As you come up with your Kubernetes logging strategy, here are some key Kubernetes metrics to log. It’s cool to see how LogDNA solves this integration with Kubernetes better than any other log management solution in the market. You have a choice of doing this yourself or get up and running with just two lines.


The Impact of Containerization on DevOps

Ever since its formal introduction in 2008, DevOps has helped organizations shorten the distance from development to production. Software teams are delivering changes faster without sacrificing quality, stability, or availability. To do this, developers and IT operations staff have started working closely together to create a faster, more efficient pipeline to production.

Around the same time, containers started transforming the way we develop and deploy applications. 20% of global organizations today run containerized applications in production, and more than 50% will do so by 2020. But how has the growth of containers impacted DevOps, and where will that lead us in the future?

Thinking Differently About Applications

Not too long ago, applications were deployed as large, bulky, monolithic packages. A single build contained everything needed to run the entire application, which meant that changing even a single function required a completely new build. This had a huge impact on operations, since upgrading an application meant having to stop the application, replace it with the new version, and start it up again. Virtual machines and load balancing services removed some of the pain, but the inefficiency of managing multiple virtual machines and dedicated servers made it difficult to push new features quickly.

Containers allowed us to split up these monolithic applications into discrete and independent units. Like VMs, containers provide a complete environment for software to run independently of the host environment. But unlike VMs, containers are short-lived and immutable. Instead of thinking of containers as lightweight VMs, we need to think of them more as heavy processes. For developers, this means breaking down monolithic applications into modular units. Functions in the original monolith are now treated as services to be called upon by other services. Instead of a single massive codebase, you now have multiple smaller codebases specific to each service. This allows for much greater flexibility in building, testing, and deploying code changes.

This idea of breaking up applications into discrete services has its own name: microservices. In a microservice architecture, multiple specialized services work together to perform a bigger service. The kicker is that any one service can be stopped, swapped out, restarted, and upgraded without affecting any of the other services. One developer can fix a bug in one service while another developer adds a feature to another service, and both can occur simultaneously with no perceivable downtime. Containers are the perfect vessel for microservices, as they provide the framework to deploy, manage, and network these individual services.

Giving Power to Developers

Containers also empower developers to choose their own tools. In the old days, decisions about an application’s language, dependencies, and libraries had far-reaching effects on both development and operations. Test and production environments would need to have the required software installed, configured, and fully tested before the application could be deployed. With containers, the required software travels with the application itself, giving developers more power to choose how they run their applications.

Of course, this doesn’t mean developers can just pick any language or platform for their containers. They still need to consider what the container is being used for, what libraries it requires, and how long it will take to onboard other developers. But with the flexibility and ease of replacing containers, the impact of this decision is much less significant than it would be for a full-scale application.

Streamlining Deployments

Containers don’t just make developers’ lives easier. With containers providing the software needed to run applications, operators can focus on providing a platform for the containers to run on. Orchestration tools like Docker Swarm and Kubernetes have made this process even easier by helping operators manage their container infrastructure as code. Operators can simply declare what the final deployment should look like, and the orchestrator automatically handles deploying, networking, scaling, and mirroring the containers.

Orchestration tools have also become a significant part of continuous integration and continuous deployment (CI/CD). Once a new base image has been built, a CI/CD tool like Jenkins can easily call an orchestration tool to deploy the container or replace existing containers with new versions generated from the image. The lightweight and reproducible nature of containers makes them much faster to build, test and deploy in an automated way than even virtual machines.

When combined, CI/CD and container orchestration means fast, automated deployments to distributed environments with almost no need for manual input. Developers check in code, CI/CD tools compile the code into a new image and perform automated tests, and orchestration tools seamlessly deploy containers based on the new image. Except for environment-specific details such as database connection strings and service credentials, the code running in production is identical to the code running on the developer’s machine. This is how companies like Etsy manage to deploy to production up to 50 times per day.

Security From the Start

Despite the recent publicity, volume, and scale of data breaches in IT systems, security is often treated as an afterthought. Only 18% of IT professionals treat security as a top priority for development, and 36% of IT organizations don’t budget enough for security. Surprisingly though, most feel security would be beneficial development: 43% believe fixing flaws during development is easier than patching in changes later, and only 17% believe that adding security to the development process will slow down DevOps.

Combining security and DevOps is what led to DevSecOps. The goal of DevSecOps is to shift security left in the software development lifecycle from the deployment phase to the development phase, making it a priority from the start. Of course, this means getting developers on-board with developing for security in addition to functionality and speed. In 2017, only 10 to 15% of organizations using DevOps managed to accomplish this.

“Security is seen as the traditional firewall to innovation and often has a negative connotation. With shifting security left, it’s about helping build stuff that’s innovative and also secure.”
– Daniel Cuthbert, Global Head of Cyber Security Research at Banco Santander (Source)

As DevSecOps becomes more ingrained in development culture, developers won’t have a choice but to embrace security. The immutable nature of containers make them impractical to patch after deployment, and while operations should continue monitoring for vulnerabilities, the responsibility for fixing for these vulnerabilities will still fall to developers. The good news is that with containers, vulnerabilities can be fixed, built, tested, and re-deployed up to 50% faster than with traditional application development methods.

Moving Forward

Although DevOps and containerization are still fairly new concepts, they’ve already sparked a revolution in software development. As the tools and technologies continue to mature, we’ll start to see more companies using containers to build, change, test, and push new software faster.


The Impact Of Kubernetes On The CI/CD Ecosystem

Continuous integration (CI) and continuous delivery (CD) are two sides of the same coin we call DevOps. Continuous deployment is the last step of the CD phase where the application is deployed into production. Emerging in the early part of this decade, CI and CD are not new terms to anyone who’s been managing application delivery for a while. Tools like Jenkins have done much to define what a CI/CD pipeline should look like. The key word when talking about CI/CD is ‘automation’. By automating various steps in the development cycle, we start to see improvements in speed, precision, repeatability, and quality. This is done by build automation, test automation, and release automation. Much can be said about each of these phases, as they involve new approaches that are contrary to the old waterfall approach to software delivery. To make this automation possible, it takes multiple tools working together in a deeply integrated manner.

Container ShipSource: Wikimedia.org

How Kubernetes Affects CI/CD Pipelines

While CI/CD is not new, the advent of Docker containers has left no stone unturned in the world of software. More recently, the rise of Kubernetes within the container ecosystem has impacted the CI/CD process. DevOps requires a shift from the traditional waterfall model of development to a more modern and agile development methodology.

Rather than moving code between various VMs in different environments, the same code is now moved across containers, or container clusters as is the case with Kubernetes. Unlike static VMs that are suitable to a more monolithic style of application architecture, containers require a distributed microservices model. This brings new opportunities in terms of elasticity, high availability, and resource utilization. However, rather than relying on old approaches and tools to achieve these advantages, they call for change.  

Jenkins – Build & Test Automation

Mention continuous integration, and Jenkins is the first tool that comes to mind. In recent years, Jenkins has focused on going beyond CI and handling the end-to-end development pipeline including the CD phases. Kubernetes has been the solution to this effort. With Kubernetes’ mature handling of resources in production, it’s just the partner Jenkins needs to extend its reach beyond CI.

Running Jenkins on Kubernetes brings many benefits. To start, Jenkins can take advantage of the scalability and high availability of Kubernetes. With the numerous worker nodes in Jenkins, handling infrastructure to run Jenkins can become a nightmare. Kubernetes makes this easier with its automatic pod management features.

Further, Kubernetes enables zero-downtime updates with Jenkins. This is made possible by the rolling updates feature of Kubernetes where it gradually phases out pods with an older version of the application and replaces them with new ones. It does this by keeping a watch on the number of ‘maxUnavailable’ pods and ensuring they’re enough to run the application at all times during the update. In this way, Kubernetes brings the ability to do canary releases and blue-green deployments to Jenkins.

Apart from Jenkins, there are also many new CI tools that are built from a container-first standpoint. These include CircleCI, Travis, CodeFresh, Drone, and Wercker. Many of these tools provide a simpler user experience than Jenkins and are fully managed SaaS solutions. Almost all of them encourage a ‘pipeline’ model to deploying software, and in doing so bring greater control and flexibility in how you manage the CI process. They also feature integrations with all major cloud providers, making them a great alternative to the industry-leading Jenkins.  

Spinnaker – Multi-Cloud Deployment

While Jenkins is perfect for the build stages of the pipeline, perhaps an even more complex problem to solve is deployments, especially when it involves multiple cloud platforms and mature deployment practices. Kubernetes has a deployment API which has support for rollout, rollback, and other core deployment functionality. However, another open source tool, Spinnaker, create by Netflix, has been in the spotlight for its advanced deployment controls.

Spinnaker focuses on the last mile of the delivery pipeline – deployment in the cloud. It automates deployment processes and cloud resources and acts as a bridge between the source code on say, Github, and the deployment target like a cloud platform. The best part is that Spinnaker supports multiple cloud platforms, and enables a multi-cloud approach to infrastructure. This is one of the original promises of Kubernetes, and is being made accessible to all by Spinnaker.

Spinnaker uses pipelines to allow users to control a deployment. These pipelines are deeply customizable. It automatically replaces unhealthy VMs in a cloud platform so you can focus more on defining the required resources for your applications than on maintaining those resources.

Spinnaker isn’t a one-stop-solution. In fact, it leverages Jenkins behind the scenes to manage builds, and is built on top of the Kubernetes deployment API adding advanced functionality of its own at every stage. For example, while rollbacks are possible with the Kubernetes API, they’re much faster and easier to execute with Spinnaker. Considering its focus is deployment, it’s no surprise that Spinnaker has first class support for operations like canary releases and blue-green deployments.

While Jenkins excels at build automation and initiating automated tests, Spinnaker complements it well by enabling complex deployment strategies. Which tool you choose will depend on your circumstances. Teams that are deeply invested in Jenkins may find it easier to simply better manage Jenkins using Kubernetes. Teams that are looking for a better and easier way to handle deployments than Jenkins would want to give Spinnaker a spin. Either way, Kubernetes will play a role in ensuring that CI/CD pipelines function seamlessly.

A CI/CD pipeline management tool is essential as it acts as the control pane for your operations. However, it’s not the only tool you’ll use.

Helm – Package Management

Helm is a package manager for Kubernetes that makes it easy to install applications in Kubernetes. With automation being key to successful CI/CD pipelines, it’s essential to be able to quickly package, share and install application code and its dependencies. Helm has a collection of ‘charts’ with each chart being a package that you can install in Kubernetes. Helm places an agent called Tiller within the Kubernetes cluster which interacts with the Kubernetes API and handles installing and managing of packages.

The biggest advantage of Helm is that it brings predictability and repeatability to the CI/CD pipeline. It lets you define and add extensive configurations and metadata for every deployment. Further, it gives you complete control over rollbacks and brings deep visibility into every stage of a deployment.

Trends in CI/CD

Kubernetes is changing CI/CD for the better. By enabling and transforming tools like Jenkins, Spinnaker, and Helm, Kubernetes is ushering in a new way to deploy applications. While the ideas for doing canary releases and blue-green deployments have been around for over a decade, they’re made truly possible with the advances that Kubernetes brings. Here are some of the trends that are emerging because of the influence of Kubernetes.


All CI/CD tools today look at the software delivery cycle as a pipeline with linear steps from start to end. However, pipelines aren’t straightforward and allow for complex changes at every step. The biggest benefit is the ability to abstract the entire process and make it easier to manage. The pipeline model allows a view into how each component depends on others, and view every step in context of the other steps. Previously, each step was disconnected from the other and silos were the norm. Today, with CI/CD tools and Kubernetes, pipelines aren’t just on paper; rather, they are how software delivery happens in practice.

Configuration As Code

Infrastructure previously was controlled by its parent platform. VMware dictated how you interact with VMs and every change has to be made manually, separate from other changes. Today, with tools like Spinnaker and Helm, infrastructure is configured and managed via YAML files. This doesn’t just ease creation of resources but enables better troubleshooting.

Visibility & Control

Previously, version control was restricted to certain parts of the pipeline, but with Kubernetes, version control is built-in as every change is recorded and versioned and can be retrieved or rolled back to if needed.

Speaking about visibility, monitoring becomes more comprehensive with capable tools that are built to easily handle the scale and nuances of a Kubernetes-driven process. Tools like Prometheus and Heapster are great at delivering a stream of real-time metrics. Additionally, logging tools like LogDNA help to capture the minute details about every deployment. This includes logging exceptions, errors, states, and events.

Multi-Cloud Support

CI/CD tools today need to support multiple cloud platforms. Not that the same app would be run on multiple cloud platforms, but in a large organization different teams and different apps would use various platforms to meet specific needs. A modern CI/CD tool needs to cater to the needs of diverse teams and applications and this means supporting all major cloud platforms and private data centers as well.


Kubernetes has changed how software is built and shipped. What began with the cloud computing movement and CI tools like Jenkins about a decade ago is now coming of age with Kubernetes. What’s amazing is that these technologies are not being adopted by startups or fringe organizations, but rather by mainstream large enterprises who are looking for a way to modernize their application stack and infrastructure stack. They’re looking for solutions to real-world problems they face. If these CI/CD solutions tell us anything, it’s that Kubernetes is delivering where it really matters, and this is making CI/CD become a reality in many organizations. It’s about time, after all.

kubernetes, Technical

Top Kubernetes Metrics & Logs for End-to-End Monitoring

Kubernetes makes life as a DevOps professional easier by creating levels of abstractions like pods and services that are self sufficient. Now, though this means we no longer have to worry about infrastructure and dependencies, what doesn’t change is the fact that we still need to monitor our apps, the containers they’re running on, and the orchestrators themselves. What makes things more interesting, however, is that the more Kubernetes piles on levels of abstraction to “simplify” our lives, the more levels we have to see through to effectively monitor the stack.

Across the various levels you need to monitor resource sharing, communication, application deployment and management, and discovery. Pods are the smallest deployable units created by Kubernetes that run on nodes which are grouped into clusters. This means that when we say “monitoring” in Kubernetes, it could be at a number of levels — the containers themselves, the pods they’re running on, the services they make up, or the entire cluster. Let’s look at the key metrics and log data that we need to analyze to achieve end-to-end visibility in a Kubernetes stack.

Usage Metrics

Performance issues generally arise from CPU and memory usage and are likely the first resource metrics users would want to review. This brings us to cAdvisor, an open source tool that automatically discovers every container and collects CPU, memory, filesystem, and network usage statistics. Additionally, cAdvisor also provides the overall machine usage by analyzing the ‘root’ container on the machine. Sounds too good to be true, doesn’t it? Well it is, and the catch is that cAdvisor is limited in a sense that It only collects basic resource utilization and doesn’t offer any long term storage or analysis capabilities.

CPU, Memory and Disk I/O

Why is this important? With traditional monitoring, we’re all used to monitoring actual resource consumption at the node level. With Kubernetes, we’re looking for the sum of the resources consumed by all the containers across nodes and clusters (which keeps changing dynamically). Now, if this sum is less than your node’s capacity, your containers have all the resources they need, and there’s always room for Kubernetes to schedule another container if load increases. However, If it goes the other way around and you have too few nodes, your containers might not have enough resources to meet requests. This is why making sure that requests never exceed your collective node capacity is more important than monitoring simple CPU or memory usage.

With regards to disk usage and I/O, with Kubernetes we’re more interested in the percentage of disk in use as opposed to the size of our clusters, so graphs are wired to trigger alerts based on the percentage of disk size being used. I/O is also monitored in terms of Disk I/O per node, so you can easily tell if increased I/O activity is the cause for issues like latency spikes in particular locations.

Kube Metrics

There are a number of ways to collect metrics from Kubernetes, although Kubernetes doesn’t report metrics and instead relies on tools like Heapster instead of the cgroup file. This is why a lot of experts say that container metrics should usually be preferred to Kubernetes metrics. A good practice however, is to collect Kubernetes data along with Docker container resource metrics and correlate them with the health and performance of the apps they run. That being said, while Heapster focuses on forwarding metrics already generated by Kubernetes, kube-state-metrics is a simple service focused on generating completely new metrics from Kubernetes.

These metrics have really long names which are pretty self explanatory; kube_node_status_capacity_cpu_cores and kube_node_status_capacity_memory_bytes are the metrics used to access your node’s CPU and memory capacity respectively. Similarly, kube_node_status_allocatable_cpu_cores tracks CPU resources currently available and kube_node_status_allocatable_memory_bytes does the same for memory. Once you get the hang of how they’ve been named, it’s pretty easy to make out what the metric keeps track of.

Consuming Metrics

These metrics are designed to be consumed either by Prometheus or a compatible scraper, and you can also open /metrics in a browser to view them raw. Monitoring a Kubernetes cluster with Prometheus is becoming a very popular choice as both Kubernetes & Prometheus have similar origins and are instrumented with the same metrics in the first place. This means less time and effort lost in “translation” and more productivity. Additionally, Prometheus also keeps track of the number of replicas in each deployment, which is an important metric.

Pods typically sit behind services that are scaled by “replica sets” which create or destroy pods as needed and then disappear. ReplicaSets are further controlled by “declaring state” for a number of running ReplicaSets (done during deployment). This is another example of a feature built to improve performance that makes monitoring more difficult. Replica sets need to be monitored and kept track of just like everything else if you want to continue to make your applications perform better and faster.

Network Metrics


Now, like with everything else in Kubernetes, networking is about a lot more than network in, network out and network errors. Instead you have a boatload of metrics to look out for which include request rate, read IOPS, write IOPS, error rate, network traffic per second and network packets per second. This is because we have new issues to deal with as well, like load balancing and service discovery and where you used to have network in and network out, there are thousands of containers. These thousands of containers make up hundreds of microservices which are all communicating with each other, all the time.

A lot of organizations are turning to a virtual network to support their microservices as software-defined networking gives you the level of control you need in this situation. That’s why a lot of solutions like Calico, Weave, Istio and Linkerd are gaining popularity with their tools and offerings. SD-WAN especially is becoming a popular choice to deal with microservice architecture.

Kubernetes Logs

Everything a containerized application writes to stdout and stderr is handled and redirected somewhere by a container engine and, more importantly, is logged somewhere. The functionality of a container engine or runtime, however, is usually not enough for a complete logging solution because when a container crashes, for example, it takes everything with it, including the logs. Therefore, logs need a separate storage, independent of nodes, pods, or containers. To implement this cluster-level, logging is used, which provides a separate backend to store and analyze your logs. Kubernetes provides no native storage solution but you can integrate quite a few existing ones.

Kubectl Logs

Kubectl is the logging command to see logs from the Kubernetes CLI and can be used as follows:

$ kubectl logs

This is the most basic way to view logs on Kubernetes and there are a lot of operators to make your commands even more specific. For example, “$ kubectl logs pod1” will only return logs from pod1. “$ kubectl logs -f my-pod” streams your pod logs, and “kubectl logs job/hello” will give you the logs from the first container of a job named hello.

Logs for Troubleshooting


Logs are particularly useful for debugging problems and troubleshooting cluster activity. Some variations of kubectl logs for troubleshooting are:

  • kubectl logs –tail=20 pod1” which displays only the most recent 20 lines of output in pod1; or
  • kubectl logs –since=1h pod1” which will show you all logs from pod1 written in the last hour.

To get the most out of your log data, you can export your logs to a log analysis service like LogDNA and leverage its advanced logging features. LogDNA’s Live Streaming Tail makes troubleshooting with logs even easier since you can monitor for stack traces and exceptions in real time, in your browser. It also lets you combine data from multiple sources with all related events so you can do a thorough root cause analysis while looking for bugs.

Logging Levels and Verbosity

Additionally, there are different logging levels depending on how deep you want to go; if you don’t see anything useful in the logs and want to dig deeper, you can select a level of verbosity. To enable verbose logging on the Kubernetes component you are trying to debug, you just need to use –v or –vmodule, to at least level 4, though it goes up all the way to level 8. While level 3 gives you a reasonable amount of information with regards to recent changes made, level 4 is considered debug-level verbosity. Level 6 is used to display requested resources while level 7 displays HTTP request headers and 8 HTTP request contents. The level of verbosity you choose will depend on the task at hand, but it’s good to know that Kubernetes gives you deep visibility when you need it.

Kubernetes monitoring is changing and improving every day because at the end of the day, that’s the name of the new game. The reason monitoring is so much more “proactive” now is because everything rests on how well you understand the ins and outs of your containers. The better the understanding, the better the chances of improvement, the better the end user experience. So in conclusion, literally everything depends on how well you monitor your applications.

kubernetes, Technical

Kubernetes Logging 101

Containerization brings predictability and consistency across the development pipeline. A developer can package code in a container and ship the same container into production knowing it will work the same. However, for this consistent experience to happen, there are many cogs and levers working in the underlying layers of the container stack. Containers abstract away the complex internals of infrastructure and deliver a simple consistent user experience. The part of the Docker stack that’s especially important in this aspect is the orchestration layer.

What is Kubernetes?

An orchestration tool like Kubernetes takes care of the complexity of managing numerous containers by providing many smart defaults. It takes care of changes and configuration with groups of containers called pods, and groups of pods called clusters. In doing so, it lets you focus on what matters most to you – the code and data that’s housed in your Kubernetes cluster. Because of these advantages, Kubernetes has become the leading container orchestration tool today.

Source: Pexels.com

Kubernetes makes it easy to manage containers at scale, but it comes with a steep learning curve. This is the reason for the numerous startups offering managed Kubernetes services – Platform9, Kismatic, OpenShift, and CoreOS Tectonic to name a few. However, learning the ins-and-outs of Kubernetes is well worth the effort because of the power and control it gives you.

No matter which route you take to managing your Kubernetes cluster, one fundamental requirement to running a successful system is log analysis. Traditional app infrastructure required log data to troubleshoot performance issues, system failures, bugs, and attacks. With the emergence of modern infrastructure tools like Docker and Kubernetes, the importance of logs has only increased.

The Importance of Log Data in Kubernetes

Log data is essential to Kubernetes management. Kubernetes is a very dynamic platform with tons of changes happening all the time. As containers are started and stopped, and IP addresses and loads change, Kubernetes makes many minute changes to ensure services are available and performance is not impacted. But there is still the odd time when things break, or performance slows down. At those times, you need the detail that only log data can provide. Not both performance and security, you need log data to ensure proper compliance to laws like HIPAA and PCI DSS. Or, if there’s a data breach, you’ll want to go back in time to identify the origin of the attack and its progression across your system. For all these use cases, log data is indispensable.

There are many ways you can access and analyze Kubernetes log data ranging from simple to advanced. Let’s start with the simplest option and move up the chain.

Monitoring A Pod

Pod-level monitoring is the most rudimentary form of viewing Kubernetes logs. You use the kubectl commands to fetch log data for each pod individually. These logs are stored in the pod and when the pod dies, the logs die with them. They are useful when you’re just starting out, and have just a few pods. You can instantly check the health of pods without needing a robust logging setup for a big cluster.

Monitoring A Node

Logs collected for each node are stored in a JSON file. This file can get really large, and in this case, you can use the logrotate function to split the log data in multiple files once a day, or when the data reaches a particular size like 10MB. Node-level logs are more persistent than pod-level ones. Even if a pod is restarted, it’s previous logs are retained in a container. But if a pod is evicted from a node, its log data is deleted.

While pod-level and node-level logging are important concepts in Kubernetes, they aren’t meant to be real logging solutions. Rather, they act as a building block for the real solution – cluster-level logging.

Monitoring the Cluster

Kubernetes doesn’t provide a default logging mechanism for the entire cluster, but leaves this up to the user and third-party tools to figure out. One approach is to build on the node-level logging. This way, you can assign an agent to log every node and combine their output.

The default option is Stackdriver which uses a Fluentd agent and writes log output to a local file. However, you can also set it to send the same data to Google Cloud. From here you can use Google Cloud’s CLI to query the log data. This, however, is not the most powerful way to analyze your log data.

The ELK Stack

The most common way to implement cluster-level logging is to use a Fluentd agent to collect logs from the nodes, and pass them onto an external Elasticsearch cluster. The log data is stored and processed using Elasticsearch, and can be visualized using a tool like Kibana. The ELK stack (Elasticsearch, Logstash, Kibana) is the most popular open source logging solution today, and its components often form the base for many other modern logging solutions, including LogDNA (but that’s a topic for a whole other post). The ELK stack offers more powerful logging, and more extensibility than the Stackdriver / Google Cloud option.

One example of an organization that uses this setup for centralized logging for their Kubernetes cluster is Samsung. They use the Fluentd / ELK stack combination, but add Kafka for an added step of buffering and monitoring. Samsung has even open sourced this configuration of tools and called K2 Charts.

Sidecar Containers

You can stream logs of different formats together, but this would be harder to analyze, considering the scale of Kubernetes and how complicated Kubernetes log collection can get. Instead, the preferred way is to attach a sidecar container for each type of log data. A sidecar container is dedicated to collecting logs, and is very lightweight. Every sidecar container contains a Fluentd agent for collecting and transporting logs to a destination.

Archived Log Storage

Storing logs is critical, especially for security. For example, you may find out about a breach in your system that started two years ago, and want to trace its development. In this case, you need archived log data to go back to that point in time, and see the origin of the breach, and to what extent it has impacted your system.

Kubernetes offers basic local storage for logs, but this is not what you’d want to use for a production cluster. You can either use block storage like AWS S3 or Azure Blog, or you can ask your log analysis vendor to give you extended storage on their platform. For archived data, it’s best to leverage cloud storage than on premise servers as they’re more cost efficient and can be easily accessed when needed.

Dedicated Log Analysis Platforms

The ELK stack is a common way to access and manage Kubernetes logs, but it can be quite complex with the number of tools to setup and manage. Ideally, you want your logging tool to get out of the way and let you focus on your log data and your Kubernetes cluster. In this case, it pays to go with a dedicated log management and analysis platform like LogDNA, which comes with advanced cloud logging features, and is fully managed so you don’t have to worry about availability and scaling your log infrastructure.

You can start collecting Kubernetes logs in LogDNA using just 2 simple kubectl commands:

kubectl create secret generic logdna-agent-key –from-literal=logdna-agent-key=YOUR-INGESTION-KEY-HERE

kubectl create -f https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml

Deeply customized for Kubernetes, LogDNA automatically recognizes all metadata for your Kubernetes cluster including pods, nodes, containers, and namespaces. It lets you analyze your Kubernetes cluster in real-time, and provides powerful natural language search, filters, parsing, shortcuts, and alerts.

LogDNA even mines your data using machine learning algorithms and attempts to predict issues even before they happen. This is the holy grail of log analysis, and it wasn’t possible previously. Thanks to advances in machine learning and the cloud enabling computing at this scale, it’s now a reality.

To summarize, Kubernetes is the leading container orchestration platform available today. Yet, running a production cluster of Kubernetes takes a lot of familiarity with the system and robust tooling. When it comes to log analysis, Kubernetes offers basic log collection for pods, nodes, and clusters, but for a production cluster you want unified logging at the cluster level. The ELK stack comes closest to what a logging solution for Kubernetes should look like. However, it’s a pain to maintain and runs into issues once you hit the limits of your underlying infrastructure.

For unified log analysis for Kubernetes, you need a dedicated log analysis platform like LogDNA. It comes with advanced features like powerful search, filtering, and machine learning to help you get the most out of your log data. Being a fully managed service, you can focus on your Kubernetes cluster and leave the drudge of maintaining log infrastructure to LogDNA. As you run a production Kubernetes cluster, you need a powerful log analysis tool like LogDNA to truly enjoy the promise of Kubernetes – running containers at massive scale.

Learn more about LogDNA for Kubernetes here.

kubernetes, Technical

Logging In The Era Of Containers

Log analysis will always be fundamental to any monitoring strategy. It is the closest you can get to the source of what’s happening with your app or infrastructure. As application development has undergone drastic change over the past few years with the rise of DevOps, containerization, and microservices architecture, logs have not become less important, rather they are now at the forefront of running stable, highly available, and performant applications. However, the practice of logging has changed. From being simple it is now complex, from just a few hundred lines of logs we now often see millions of lines of logs, from all being in one place we are now dealing with distributed log data. Yet as logging has become more challenging, a new breed of tools have arrived on the scene to manage and make sense of all this logging activity. In this post, we’ll look at the sea change that logging has undergone, and how innovative solutions have sprung up to address these challenges.

Complexity of the stack

Traditional client-server applications were simple to build, understand, and manage. The frontend was required to run on a few browsers or operating systems. The backend consisted of a single consolidated database, or at most a couple of databases on a single server. When something goes wrong you can jump into your system logs at /VAR/LOG and easily identify the source of the failure and how to fix it.

With today’s cloud-native apps, the application stack has become tremendously complex. Your apps need to run on numerous combinations of mobile devices, browsers, operating systems, edge devices, and enterprise platforms. Cloud computing has made it possible to deliver apps consistently across the world using the internet, but it comes with its own challenges of management. VMs (virtual machines) brought more flexibility and cost efficiencies over hardware servers, but organizations soon outgrew them and needed a faster way to deliver apps. Enter Docker.

Containers bring consistency to the development pipeline by breaking down complex tasks and code into small manageable chunks. This fragmentation lets organizations ship software faster, but it requires you to manage a completely new set of components. Container registries, the container runtime, an orchestration tool or CaaS service – all make a container stack more complex than VMs.

Volume of data has spiked

Each component generates its own set of logs. Monolithic apps are decomposed into microservices with each service being powered by numerous containers. These containers have short life spans of a few hours compared to VMs which typically run for months or even years. Every request is routed across the service mesh and touches many services before being returned as a response to the end user. As a result, the total volume of logs has multiplied. Correlating the logs in one part of the system with those of another part is difficult, and insights are hard won. Having more log data is an opportunity for better monitoring, but only if you’re able to glean insights out of the data efficiently.

Many logging mechanisms

Each layer has its own logging mechanism. For example, Docker has drivers for many log aggregators. Kubernetes, on the other hand, doesn’t support drivers. Instead it uses a Fluentd agent running on every node. More on Fluentd later in this post. Kubernetes doesn’t have native log storage, so you need to configure log storage externally. If you use a CaaS platform like ECS, they would have their own set of log data. ECS has it’s own logs collector. With log collection so fragmented, it can be dizzying to jump from one tool to another to make sense of errors when troubleshooting. Containers require you to unify logging from all the various components for the logs to be useful.

The rise of open source tools

As log data has become more complex the solutions for logging have matured as well. Today, there are many open source tools available. The most popular open source logging tool is the ELK stack. It’s actually a collection of three different open source tools – Elasticsearch, Logstash, and Kibana. Elasticsearch is a distributed full-text search database, Logstash is a log aggregation tool, and Kibana is a visualization tool for time-series data. It’s easy to get started with the ELK stack when you’re dipping your toes into container logging, and it packs a lot of powerful features like high availability, near-real-time analysis, and great visualizations. However, once your logs reach the limits of your physical nodes that power the ELK stack, it becomes challenging to maintain operations smoothly. Performance lags and resource consumption become an issue. Despite this, the ELK stack has sparked many other container logging solutions like LogDNA. These solutions have found innovative ways to deal with the problems that weigh down the ELK stack.

Fluentd is another tool commonly used along with the ELK stack. It is a log collection tool that manages the flow of log data from the source app to any log analysis platform. Its strength is that it has a wide range of plugins and can integrate with a wide variety of sources. However, in a Kubernetes setup, to send logs to Elasticsearch, Fluentd places an agent in every node, and so becomes a drain on system resources.

Machine learning is the future

While open source tools have led the way in making logging solutions available, they require a lot of maintenance overhead when monitoring real-world applications. Considering the complexity of the stack, volume of data, and various logging mechanisms, what’s needed is a modern log analysis platform that can intelligently analyze log data and derive insights. Analyzing log data by manual methods is a thing of the past. Instead machine learning is opening up possibilities to let algorithms do the heavy lifting of crunching log data and extracting meaningful outcomes. Because algorithms can spot minute anomalies that would be invisible to humans, they can identify threats much before a human would, and in doing so can help prevent outages even before they happen. LogDNA is one of the pioneers in this attempt to use machine learning to analyze log data.

In conclusion, it is an exciting time to build and use log analysis solutions. The challenge is great, and the options are plenty. As you choose a logging solution for your organization, remember the differences between legacy applications and modern cloud-native ones, and choose a tool that supports the latter most comprehensively. And as you think about the future of log management, remember that the key words are ‘machine learning’.