Technical

Logging with Kubernetes should not be this painful

Before reading this article, we recommend having a basic working knowledge of Kubernetes. Check out our previous article, Kubernetes in a nutshell, for a brief introduction.

Logging Kubernetes with Elasticsearch stack is free, right?

If the goal of using Kubernetes is to automate management of your production infrastructure, then centralized logging is almost certainly a part of that goal. Because containers managed by Kubernetes are constantly created and destroyed, and each container is an isolated environment in itself, setting up centralized logging on your own can be a challenge. Fortunately, Kubernetes offers an integration script for the free, open-source standard for modern centralized logging: the Elasticsearch stack (ELK). But beware, Elasticsearch and Kibana may be free, but running ELK on Kubernetes is far from cheap.

Easy installation

As outlined by this Kubernetes docs article on Elasticsearch logging, the initial setup required for an Elasticsearch Kubernetes integration is actually fairly trivial. You set an environment variable, run a script, and boom you’re up and running.

However, this is where the fun stops.

The way the Elasticsearch Kubernetes integration works is by running per-node Fluentd collection pods that send log data to Elasticsearch pods, which can then be viewed by accessing the Kibana pods. In theory, this works just fine, however, in practice, Elasticsearch requires significant effort to scale and maintain.

JVM woes

Since Elasticsearch is written in Java, it runs inside of a Java Virtual Machine (JVM), which has notoriously high resource overhead, even with properly configured garbage collection (GC). Not everyone is a JVM tuning expert. Instead of being several services distributed across multiple pods, Elasticsearch is one giant service inside of a single pod. Scaling individual containers with large resource requirements seems to defeat much of the purpose of using Kubernetes, since it is likely that Elasticsearch pods may eat up all of a given node’s resources.

Elasticsearch cluster overhead

Elasticsearch’s architecture requires multiple Elasticsearch masters and multiple Elasticsearch data nodes, just to be able to start accepting logs at any scale beyond deployment testing. Each of these masters and nodes all run inside a JVM and consume significant resources as a whole. If you are logging at a reasonably high volume, this overhead is inefficient inside a containerized environment and logging at high volume, in general, introduces a whole other set of issues. Ask any of our customers who’ve switched to us from running their own ELK.

slack-imgs

Free ain’t cheap

While we ourselves are often tantalized by the possibility of using a fire-and-forget open-source solution to solve a specific problem, properly maintaining an Elasticsearch cluster is no easy feat. Even so, we encourage you to learn more about Elasticsearch and what it has to offer, since it is, without a doubt, a very useful piece of software. However, like all software it has its nuances and pitfalls, and is therefore important to understand how they may affect your use case.

Depending on your logging volume, you may want to configure your Elasticsearch cluster differently to optimize for a particular use case. Too many indices, shards, or documents all can result in different crippling and costly performance issues. On top of this, you’ll need to constantly monitor Elasticsearch resource usage within your Kubernetes cluster so your other production pods don’t die because Elasticsearch decides to hog all available memory and disk resources.

At some point, you have to ask yourself, is all this effort worthwhile?

LogDNA cloud logging for Kubernetes

logo_k8

As big believers in Kubernetes, we spent a good amount of time researching and optimizing our integration. In the end, we were able to get it down to a copy-pastable set of two kubectl commands:

kubectl create secret generic logdna-agent-key --from-literal=logdna-agent-key=<YOUR-API-KEY-HERE>
kubectl create -f https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml

This is all it takes to send your Kubernetes logs to LogDNA. No manual setting of environment variables, no editing of configuration files, no maintaining servers or fiddling with Elasticsearch knobs, just copy and paste. Once executed, you will be able to view your logs inside the LogDNA web app. And we extract all pertinent Kubernetes metadata such as pod name, container name, namespace, container id, etc. No Fluentd parsing pods required (or any other dependencies, for that matter).

Easy installation is not unique to our Kubernetes integration. In fact, we strive to provide concise and convenient instructions for all of our integrations. But don’t just take our word for it, you can check out all of our integrationsWe also support a multitude of useful features, including alerts, JSON field search, archiving, line-by-line contextual information.

All for $1.25/GB per month. We’re huge believers in pay for what you use. In many cases, we’re cheaper than running your own Elasticsearch cluster on Kubernetes.

For those of you not yet using LogDNA, we hope our value proposition is convincing enough to at least give us a try.

If you don’t have a LogDNA account, you can create one on https://logdna.com or if you’re on macOS w/homebrew installed:

brew cask install logdna-cli
logdna register 
# now paste the api key into the kubectl commands above

Thanks for reading and we look forward to hearing your feedback.

Technical

Kubernetes in a Nutshell

kuberneteslogo

In addition to being the greek word for helmsman, Kubernetes is a container orchestration tool that enables container management at scale. Before we explain more about Kubernetes, it is important to understand the larger context of containers.

Broadly, containers isolate individual apps within a consistent environment and can be run on a wide variety of hosting providers and platforms. This enables developers to test their app in a development environment identical to their production environment, without worrying about dependency conflicts or hosting provider idiosyncrasies. When new code is deployed, the currently running containers are systematically destroyed and then replaced by new containers running the new code, effectively providing quick, stateless provisioning. This is especially appealing to those using a microservices architecture, with many moving parts and dependencies.

While containers in themselves ensure a consistent, isolated environment, managing a multitude of individual containers is cumbersome without a purpose-built management tool, like Kubernetes. Instead of manually configuring networking and DNS, Kubernetes not only lets you deploy and manage thousands of containers, but also automates nearly all aspects of your development infrastructure. In addition to networking and DNS, Kubernetes also optimizes sharing machine resources across containers, so that any given machine, or node, is properly utilized, reducing costly inefficiencies. This is particularly powerful at scale since you effectively eliminate a significant chunk of DevOps overhead.

To learn more about Kubernetes, check out this handy getting started guide. If you’re feeling bullish on Kubernetes and want to learn more about your logging options, read Logging with Kubernetes should not be this painful.