Technical

Scaling Elasticsearch – The Good, The Bad and The Ugly

ElasticSearch bursted on the database scene in 2010 and has now become a staple of many IT teams’ workflow. It has especially revolutionized data intensive tasks like ecommerce product search, and real-time log analysis. ElasticSearch is a distributed, full-text search engine, or database. Unlike traditional relational databases that store data in tables, ElasticSearch uses a key value store for objects, and is a lot more versatile. It can run queries way more complex than traditional databases, and do this at petabyte scale. However, though ElasticSearch is a very capable platform, it requires a fair bit of administration to get it to perform at its best. In this post, we look at the pros and cons of running an ElasticSearch cluster at scale.

The Pros

1. Architected for scale

ElasticSearch has a more nuanced, and robust architecture than relational databases. Here are some of the key parts of ElasticSearch, and what they mean:

  • Cluster: A collection of nodes and is sometimes used to refer to the ElasticSearch instance itself
  • Index: A logical namespace that’s used to organize and identify the data in ElasticSearch
  • Type: A category used to organize data within an index
  • Document: A collection of data objects within an index
  • Shard: A partition of data that is part of an index, and runs on a node
  • Node: The underlying physical server that hosts shards with all their data
  • Replica shard: An exact replica of a primary shard that’s typically placed on a node separate from the primary shard

elas_0205

The levels of abstraction in the architecture makes it easy to manage the system. Whether it’s the physical nodes, the data objects, or shards – they can all be easily controlled at an individual level, or at an aggregate level.

2. Distributed storage and processing

Though ElasticSearch can process data on a single node, it’s built to function across numerous nodes. It allocates primary and replica shards equally across all available nodes, and generates high throughput using parallel processing. This means when it receives a query, it knows which shards have the data required for process the query, and it retrieves data from all those shards simultaneously. This way it can leverage the memory and processing power of many nodes at the same time.

The best part is that this parallelization is all built-in. The user doesn’t have to lift a finger to configure how requests are routed among shards. The strong defaults of the system make it easy to get started with ElasticSearch. ElasticSearch abstracts away the low lying processes, and delivers a great user experience.

3. Automated failover handling

When working at scale, the most critical factor is to ensure high availability. ElasticSearch achieves this in the way it manages its nodes and shards. All nodes are managed by a master node. The master records changes with nodes such as adding and removing of nodes. Every time a node is added or removed the master re-shards the cluster and re-balances how shards are organized on nodes.

The master node doesn’t control data processing, and in this way doesn’t become a single point of failure. In fact, no single node can bring the system down, not even the master node. If the master node fails, the other nodes auto-elect one of the nodes to replace it. This self-organizing approach to infrastructure is what ElasticSearch excels at, and this is why it works great at scale.

4. Powerful ad hoc querying

ElasticSearch breaks the limits of relational databases with its full-text search capabilities. Relational databases are stored in rows and columns and are rigid in how they store and process data. ElasticSearch on the other hand stores data in the form of objects. These objects can be connected to each other in complex structures that can’t be attained with rows and columns.

For example, ElasticSearch can sort its text-based search results based on relevance to the query. ElasticSearch can execute this complex processing at large scale and return results just as quickly. In fact, the results can be returned in near-real time making ElasticSearch a great option for troubleshooting incidents using logs, or powering search suggestions. This speed is what separates ElasticSearch from traditional options.

 

The Cons

For all its strengths, ElasticSearch has a few weaknesses that show up when you start to scale it to hundreds of nodes and petabytes of data. Let’s discuss them.

1. Can be finicky with the underlying infrastructure

ElasticSearch is great for parallel processing, but once you scale up, capacity planning is essential to get it to work at the same speed. ElasticSearch can handle a lot of nodes, however, it requires the right kind of hardware to perform at peak capacity.

If you have too many small servers it could result in too much overhead to manage the system. Similarly, if you have just a few powerful servers, failover could be an issue. ElasticSearch works best on a group of servers with 64GB of RAM each. Less than that and it may run into memory issues. Similarly, queries are much faster on data stored in SSDs than rotating disks. However, SSDs are expensive, and when storing terabytes or petabytes of data, you need to have a mix of both SSD and rotating disks.

These considerations require planning and fine-tuning as the system scales. Much of the effort is in maintaining the infrastructure that powers ElasticSearch than managing the data inside it.

2. Needs to be fine-tuned to get the most out of it

Apart from the infrastructure, you also need to set the right number of replica shards to ensure all nodes are healthy with ‘green’ status, and not ‘yellow’. For querying to run smoothly, you need to have a well-organized hierarchy of Indexes, Types, and IDs – though this is not as difficult as with relational databases.

You can fine-tune the cluster manually when there’s less data, but today’s apps change so fast, that you’ll end up with a lot of management overhead. What you need is a way to organize data and the underlying infrastructure that will work at scale.

3. No geographic distribution

Though it can work in this way, ElasticSearch doesn’t recommend distributing data across multiple locations globally. The reason is that it treats all nodes as if they were in the same data center, and doesn’t take into account network latency. This results in slower processing of queries if the nodes aren’t colocated. However, today’s apps are geographically distributed. With microservices architecture, services can be hosted on servers globally and still need the same level of access to the data stored in ElasticSearch.

 

ElasticSearch is a powerful database for doing full-text analysis, and it is essential to DevOps teams today. However, it takes expertise to successfully scale an ElasticSearch cluster and ensure it functions seamlessly.

Technical

Logging Best Practices – New Elevated Importance in the Dev’s Toolkit

A constantly evolving development environment has completely changed the way we approach the app process. Our command-line forebears would tuck tail and run if they saw the ever-increasing complexities modern developers contend with every day. Scores of data streaming in daily, new frameworks, and technical stacks that are starting to make the biological cell look simple.

But for all of this increased complexity – an often overlooked powerful source of making sense of it all has been neglected; we’re now finally elevating this source to its proper place. We’re talking about the limitless potential that logging can provide app developers. By practicing intelligent server logging from the start, we can utilize server logs for enhanced security, data optimization and overall increased performance.

For a moment think about how important logged information is in a few select non-technical fields. Banks must have records of money transfers, airplanes have to track their flights throughout the process, and so on. If an issue were to occur right now or in the future, this data would be there to overview and help us come to a quick solution.

The best logging practices should no longer be an afterthought, but part of the development process. This line of thinking needs to be at the forefront of any future cloud-based logging strategy.

 

Develop Intelligent Logging Practices From the Start

There is a crucial importance to how your own log messages are constructed in the first place. Proper log construction is integral to making sense of your own data and allowing third party applications – like LogDNA – the ability to parse logs efficiently and help you gain increased insights on your data.

JSON logging is one of the most efficient ways to format your logs into a database that can be easily searched through and analyzed. This gives you power on your own and helps your other tools get the job done even faster. JSON’s logging format allows for a simple standard for reading and coding, without sacrificing the ability to comfortably store swathes of data your app may be producing.

It’s best to begin JSON logging from the get-go when developing a new application. But with enough elbow-grease you can reasonably go back and change an app for future JSON support.

JSON logging standards should be widespread and mandatory for the rest of your project team. This way everyone is able to comprehend the same data and avoid any communication mishaps. A majority of libraries will assist in creating metadata tags for your logs and these can then be simply configured into a JSON log. Here is a brief pseudo example of a default logging level output and it’s JSON input.

Logger Output

logger.info (“User clicked buy”, {“userId”: “xyz”, “transactionId”: “buy”, “pageId”: “check-out” }};

JSON Input

{
“alert”: “User clicked buy”,
“userId”: “xyz”,
“transactionId”: “buy”,
“pageId”: “check-out”
}

A standard like this is important namely for two major reasons. It facilitates a shared understanding that can be read between different departments, including: devops, project-leads and business oriented team members, which creates a central interchangeable environment to utilize shared data from one another. This might include varied business functions across a company, from marketing initiatives on the business front to a developer streamlining a new (UI) on the checkout cart function.

Secondly, when the log output is in this format it allows for machines to read the data thoroughly and with greater ease. What could take hours or even days while manually searching has been reduced to a few seconds with the machine’s all-seeing eye.  

 

Making Sense of Levels

The identity of your data is the next step to take once you’ve constructed it in an adequate manner. It helps you monitor how users are experiencing your interface; it can look out for debilitating performance issues, potential security problems, as well as give you the tools to use user trends to create a better experience.

Some log levels will not be on a production-level app; an example of this would be debug. Others will be streaming in constantly. Our previous example showed the information level, typically used for simple inputs of information from the user. There is a breadth of great information here – you can count on that.

Furthermore, levels such as warning, error, and worst of all fatal or critical, are logs that need immediate attention before besmirching your good name. This leads us to another important practice.  

 

Structuring Data

These levels directly translate into a few important JSON fields. The fields of either json.level and json.severity help determine where warnings and errors are coming from. You can catch some of these earlier warnings before they snowball into a major fatal problem. Valuable fields of data like this help catch major events in the process. Some other important fields to look out for include json.appId, json.platform and json.microservices. If you’re running your logs through Kubernetes then you know you’ll be running on a wide variety of different platforms. The json.platform comes in handy for this.

In summary of these few listed fields: you can filter logs pertaining to only events that cause errors or warnings, isolate messages from designated add-on apps and microservices, or self select parsed logs from multiple staging platforms.

There are several data types that JSON supports and these include numbers, strings and nested objects. The trick is in knowing how to not get these mixed up. Quotes around a number can be misread as a string. It’s important to properly label events and fields so that these mix ups don’t happen.  

Many developers are apprehensive about saving and storing all of this data, feeling as if they’ll never be able to look through any of it anyhow. There’s a part of that which is true. But the best thing about proper logging practices is that they don’t have to anyways. That’s the machine’s job. More importantly, it’s the machine-based solution that LogDNA has got down to a science. You never know which part of the data you’ll need down the line.   

 

Always Collect Data for Further Use

Logs are the nutrient rich pieces of data that can be stored and collected for future use. They don’t take up much space and can be looked over instantly. Log monitoring is the best way to get a clear picture of your overall system. Logging is a fundamental way to understand your environment and understand the myriad of events going on. Many future trends can be predicted from past data as well.

Proper logging monitoring helps detect any potential problems and you can rid them from your app before they become a major issue.

Collect as much data as you can.

You may think that not all data is created equally and that is sometimes true. But you’d be surprised what certain innocuous metrics could point towards in the future. With that being said, you should know your system better than anyone else. This extends to what type of data you think is most crucial in logging. As a general rule of thumb the following are two important pieces of information that should always be logged.

Log Poor Performance

If a certain part of your application should be performing at a general high speed, i.e. a database ping – then you should log this event if it is not performing adequately. If an event went above a certain ms, then you’d be rightfully worried about the overall performance. This leads you to look into the problem at hand.

Log Errors

You’re bound to come across a few errors after adding any type of new features to a system. Logging these errors can be a useful piece of information for debugging and future analytics. If this error somehow gets into production, you now have the tools to rid your app of the problem and ensure a happy user experience.

 

Watch for Performance Deviations & Look at Larger Trends

The macro picture of your logging data will be the deciding factor in telling you how your system is functioning and performing on an overall basis. This is more important than looking at a single data point entry. Graphing this type of data in a logging management system will give a visual representation to a process that used to be nearly inaccessible.

 

Keep the Bottom-Line in Mind

At the end of the day what matters most is that your development processes are always evolving, growing and keeping the end-user in mind. Helping your internal teams take advantage of their own data is liberating.

We help find the interwoven connections between data and apply new and innovative features to contend with a rapidly evolving logging sphere. Follow these parameters for the best logging practices through proper log message construction, collaborative standards, structuring and storing data and you will succeed.

Log management is a real time understanding of your metrics, events and more that can mean a lot of different things to different parts of your team. What was once forgotten data sitting in an old dusty cavern of your server is now a leading development tool.

Product Updates

April Product Update

Get ready for a LogDNA product update – power user edition! This month we’ve added a number of powerful features to fulfill many of the awesome use cases we’ve heard from the LogDNA community. But first, we have an announcement!

Microsoft’s Founders @ Build event in San Francisco

LogDNA is speaking at the Founders @ Build event hosted by Microsoft. The purpose of the event is to bring cool companies together like Joy, Life360, GlamSt, GitLab, and LogDNA to share their experiences and perspectives. They’ve even invited Jason Calacanis (Launch) and Sam Altman (Y Combinator President) to weigh in. We’re excited to be part of these conversations informing the next generation of startups, and want to invite you all as well. You can check out the full agenda and register here.

And now for our regularly broadcasted product update.

Export Lines

By popular request, you can now export lines from a search query! This includes not only the lines that you see in our log viewer, but thousands of lines beyond that as well. This is particularly useful for those of you who want to postprocess your data for other purposes, such as aggregation. You can access this feature from within the View menu to the left of the All Sources filter.

We also have an export lines API, just make sure you generate a service key to read the log data. More information about the API is available on our docs page.

Web Client Logging

In addition to our usual server-side logging, we now offer client-side logging! This enables some new use cases, like tracking client-side crashes and other strange behavior. Be sure to whitelist your domains on the Account Profile page to prevent CORS issues.

Line Customization

When standard log lines just don’t cut it, this feature allows you to customize the exact format of how your log lines appear in the LogDNA web app. This includes standard log line information, such as timestamp and level, as well as parsed indexed fields. You can check out this feature by heading over to the User Preferences Settings pane.

Heroku Trials

We’ve officially added a trial period for our Heroku add-on users. For new accounts that sign up to our Quaco plan, we now offer a 14-day trial, so Heroku users can get the full LogDNA experience before they sign up for a paid plan.

Other Improvements

  • logfmt parsing – Fields in logfmt lines are now officially parsed.
  • Exists operator – Search for lines with the existence of a parsed field with fieldname:*
  • Disable line wrap – Under the tools menu in the bottom right.

And that’s a wrap, folks! If you have any questions or feedback, let us know. May the logs be with you!

Product Updates

March Product Update

It’s time for another product update! We’ve been working furiously for the past month to crank out new features.

Terminology change

First and foremost, we’ve made a major terminology change. The All Hosts filter has been renamed to All Sources. This allows us to account for a wide variety log sources, and as we transition, other references to ‘host’ will be changed to ‘source’. But don’t worry, the Hosts section of the All Sources filter will remain intact.

Filter Menu Overhaul

Internally dubbed “Mega Menu”, this is a feature we’re very excited to have released. The All Sources and All Apps menus now feature a unified search bar that will display results categorized by type. No more hunting for the correct search bar category within each filter menu.

Dashboard

By popular request, we’ve added a dashboard that shows your daily log data usage in a pretty graph as well as a breakdown of your top log sources and apps. You can access the dashboard at the top of the Views section.

Ingestion controls

If you find a source that is sending way too many logs, you can nip it in the bud by using the new Manage Ingestion page. Create exclusion rules to prevent errant logs from being stored. We also provide the option to preserve lines for live tail and alerting even though those lines are not stored.

Comparison operators

For you power users out there, we’ve added field search comparison operators for indexed numerical values. This means you search for a range of values for your indexed fields. For example:

response:>=400

This will return all log lines with the indexed field ‘response’ that have values greater than or equal to 400. More information on comparison operators is available in our search guide.

Integrations

We’ve added PagerDuty and OpsGenie integrations for alert channels, so you can now receive LogDNA alert notifications on these platforms.

On the ingestion side, we’ve added an integration for Flynn. You can send us your logs from your Flynn applications by following these instructions.

Archiving

To open up archiving to more of our users, we’ve added Azure Blob storage and OpenStack Swift archiving options. You can access your archiving settings here.

Other improvements

  • Share this line – Use the context menu to share a private link to the set of lines you’re looking at or share a single line via a secret gist.
  • Search shortcut – Access the search box faster by hitting the ‘/’ key.
  • Switch Account – Open multiple tabs logged into different organizations using the Open in New Viewer option under Switch accounts.

That sums up our product update. If you have any questions or feedback, let us know. Keep calm and log on!

Technical

Logging with Kubernetes should not be this painful

Before reading this article, we recommend having a basic working knowledge of Kubernetes. Check out our previous article, Kubernetes in a nutshell, for a brief introduction.

Logging Kubernetes with Elasticsearch stack is free, right?

If the goal of using Kubernetes is to automate management of your production infrastructure, then centralized logging is almost certainly a part of that goal. Because containers managed by Kubernetes are constantly created and destroyed, and each container is an isolated environment in itself, setting up centralized logging on your own can be a challenge. Fortunately, Kubernetes offers an integration script for the free, open-source standard for modern centralized logging: the Elasticsearch stack (ELK). But beware, Elasticsearch and Kibana may be free, but running ELK on Kubernetes is far from cheap.

Easy installation

As outlined by this Kubernetes docs article on Elasticsearch logging, the initial setup required for an Elasticsearch Kubernetes integration is actually fairly trivial. You set an environment variable, run a script, and boom you’re up and running.

However, this is where the fun stops.

The way the Elasticsearch Kubernetes integration works is by running per-node Fluentd collection pods that send log data to Elasticsearch pods, which can then be viewed by accessing the Kibana pods. In theory, this works just fine, however, in practice, Elasticsearch requires significant effort to scale and maintain.

JVM woes

Since Elasticsearch is written in Java, it runs inside of a Java Virtual Machine (JVM), which has notoriously high resource overhead, even with properly configured garbage collection (GC). Not everyone is a JVM tuning expert. Instead of being several services distributed across multiple pods, Elasticsearch is one giant service inside of a single pod. Scaling individual containers with large resource requirements seems to defeat much of the purpose of using Kubernetes, since it is likely that Elasticsearch pods may eat up all of a given node’s resources.

Elasticsearch cluster overhead

Elasticsearch’s architecture requires multiple Elasticsearch masters and multiple Elasticsearch data nodes, just to be able to start accepting logs at any scale beyond deployment testing. Each of these masters and nodes all run inside a JVM and consume significant resources as a whole. If you are logging at a reasonably high volume, this overhead is inefficient inside a containerized environment and logging at high volume, in general, introduces a whole other set of issues. Ask any of our customers who’ve switched to us from running their own ELK.

slack-imgs

Free ain’t cheap

While we ourselves are often tantalized by the possibility of using a fire-and-forget open-source solution to solve a specific problem, properly maintaining an Elasticsearch cluster is no easy feat. Even so, we encourage you to learn more about Elasticsearch and what it has to offer, since it is, without a doubt, a very useful piece of software. However, like all software it has its nuances and pitfalls, and is therefore important to understand how they may affect your use case.

Depending on your logging volume, you may want to configure your Elasticsearch cluster differently to optimize for a particular use case. Too many indices, shards, or documents all can result in different crippling and costly performance issues. On top of this, you’ll need to constantly monitor Elasticsearch resource usage within your Kubernetes cluster so your other production pods don’t die because Elasticsearch decides to hog all available memory and disk resources.

At some point, you have to ask yourself, is all this effort worthwhile?

LogDNA cloud logging for Kubernetes

logo_k8

As big believers in Kubernetes, we spent a good amount of time researching and optimizing our integration. In the end, we were able to get it down to a copy-pastable set of two kubectl commands:

kubectl create secret generic logdna-agent-key --from-literal=logdna-agent-key=<YOUR-API-KEY-HERE>
kubectl create -f https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml

This is all it takes to send your Kubernetes logs to LogDNA. No manual setting of environment variables, no editing of configuration files, no maintaining servers or fiddling with Elasticsearch knobs, just copy and paste. Once executed, you will be able to view your logs inside the LogDNA web app. And we extract all pertinent Kubernetes metadata such as pod name, container name, namespace, container id, etc. No Fluentd parsing pods required (or any other dependencies, for that matter).

Easy installation is not unique to our Kubernetes integration. In fact, we strive to provide concise and convenient instructions for all of our integrations. But don’t just take our word for it, you can check out all of our integrationsWe also support a multitude of useful features, including alerts, JSON field search, archiving, line-by-line contextual information.

All for $1.25/GB per month. We’re huge believers in pay for what you use. In many cases, we’re cheaper than running your own Elasticsearch cluster on Kubernetes.

For those of you not yet using LogDNA, we hope our value proposition is convincing enough to at least give us a try.

If you don’t have a LogDNA account, you can create one on https://logdna.com or if you’re on macOS w/homebrew installed:

brew cask install logdna-cli
logdna register 
# now paste the api key into the kubectl commands above

Thanks for reading and we look forward to hearing your feedback.

Technical

Kubernetes in a Nutshell

kuberneteslogo

In addition to being the greek word for helmsman, Kubernetes is a container orchestration tool that enables container management at scale. Before we explain more about Kubernetes, it is important to understand the larger context of containers.

Broadly, containers isolate individual apps within a consistent environment and can be run on a wide variety of hosting providers and platforms. This enables developers to test their app in a development environment identical to their production environment, without worrying about dependency conflicts or hosting provider idiosyncrasies. When new code is deployed, the currently running containers are systematically destroyed and then replaced by new containers running the new code, effectively providing quick, stateless provisioning. This is especially appealing to those using a microservices architecture, with many moving parts and dependencies.

While containers in themselves ensure a consistent, isolated environment, managing a multitude of individual containers is cumbersome without a purpose-built management tool, like Kubernetes. Instead of manually configuring networking and DNS, Kubernetes not only lets you deploy and manage thousands of containers, but also automates nearly all aspects of your development infrastructure. In addition to networking and DNS, Kubernetes also optimizes sharing machine resources across containers, so that any given machine, or node, is properly utilized, reducing costly inefficiencies. This is particularly powerful at scale since you effectively eliminate a significant chunk of DevOps overhead.

To learn more about Kubernetes, check out this handy getting started guide. If you’re feeling bullish on Kubernetes and want to learn more about your logging options, read Logging with Kubernetes should not be this painful.

Reflections

The future of logging

I skate to where the puck is going to be, not where it has been.

– Wayne Gretzky

While computer logs have been around for decades, their implementation has changed over time. From a dedicated server with a single log file, to microservices distributed across multiple virtualized machines with dozens of log files, the way we generate and consume logs changes as we adopt new infrastructure and programming paradigms. Even our market space of cloud-based logging was not heavily adopted until recently, yet the way we use logs will continue to evolve. The million dollar question is, how?

The alluring cloud

Cloud-based logging is relatively new and is a result of the general industry trend of moving away from on-premise servers to cloud-based services. In fact, Gartner estimates that between 2016 and 2020, IT will spend more than $1 trillion responding to the shift to the cloud. This means that more and more businesses are moving their operations to the cloud, including their logs. This has some interesting implications.

Part of the beauty of moving to the cloud is the ability to easily deploy and scale your infrastructure without having to undertake a large internal infrastructure project. This is a significant reduction in both time and cost, and is central to the value proposition of moving to the cloud. Taking this reasoning further, since there are only a relatively small number of large-scale hosting providers, businesses can be built entirely on making cloud infrastructure management simpler, easier, and more flexible. Enter platform as a service (PaaS).

cloud-migration
Credit: biblipole.com

In addition to hosting providers like Amazon Web Services, DigitalOcean, and Microsoft Azure, all sorts of PaaS businesses have popped up, such as Heroku, Elastic Beanstalk, Docker Cloud, Flynn, Cloud Foundry and a whole host of others. These PaaS offerings have become more and more ubiquitous, and are becoming increasingly lucrative. In 2010, Heroku was bought by Salesforce for $212 million, and last year Microsoft attempted to buy Docker for a rumored $4 billion. This demonstrates a significant shift from raw hosting providers, to simplified, managed services that automate the manual grunt work of directly managing cloud infrastructure. All for similar reasons to migrating to the cloud in the first place.

So what does this have to do with logging? It means that providing ingestion integrations with PaaS as well as hosting providers becomes increasingly important. Do you integrate with Heroku? Can I send you my Docker logs? I use Flynn, how do I get my logs to you? If you’re a cloud logging provider, the answer to all of these questions should be yes. And don’t forget to create integrations as new PaaS offerings appear.

The rise of containers

With the adoption of cloud-based infrastructure, things like distributed microservices architecture have become more popular. One of the primary benefits of using a microservices architecture is its highly modular nature. This means parts can be swapped out quickly and efficiently, without as much risk of disrupting your customers. However, this also means higher risk of development environment fragmentation. This is what containers were built to solve.

Containers are essentially wrappers that isolate individual apps within a consistent environment. With containers, it shouldn’t matter what hosting provider or development infrastructure you use, your software applications should run exactly the same as they do on your development machines. Matching development and production environments means more reliable testing, and less time spent chasing down environmental issues. It also has the added benefit of being able to reliably run multiple apps inside containers on the same machine, as well as to quickly respond to fluctuating load. According to Gartner, containers are even considered more secure than running apps on a bare OS.

kubernetes-is-coming
Credit: memegenerator.net

While containers in themselves solve the problem of development environment fragmentation, managing lots of individual containers can be a pain. Hence the rise of container orchestration tools, namely Kubernetes. With Kubernetes, you can deploy and manage hundreds of containers, and easily automate nearly everything, including networking and DNS. This is particularly appealing, since managing these at scale using a traditional hosting provider takes significant effort.

Between creating a consistent environment and automating networking, adoption of Kubernetes has been steadily increasing, and is poised to become a popular tool of choice. This also means that acquiring and organizing log data from Kubernetes is paramount to understanding the state of your infrastructure. LogDNA provides integrations for this and we strongly believe that containers and container orchestration will become highly prevalent within the next few years.

Machine learning and big data

So far we’ve primarily focused on infrastructure evolution, but it is also important to consider the trajectory and impact of software trends on logging. We’ve all heard “big data” and “machine learning” toted as popular buzzword answers to solving difficult software problems, but what do they actually mean?

Before we dive deeper into logging applications (no pun intended), let’s consider the general benefits of machine learning and big data. For example, Netflix uses machine learning to provide movie recommendations based on the context-dependent aggregate preferences of its users. Google Now uses machine learning to provide you with on-demand pertinent information based on multiple contexts, such as location, time of day, and your browsing habits. In both cases, these services look for patterns in large datasets to predict what information will be useful to you.

big-data
Credit: imgflip.com

Predicting useful patterns is the key value proposition of big data and machine learning, so how does this apply to logs? Since logs enable quicker debugging of infrastructure and code issues, machine learning could be used to notify us of useful patterns that we otherwise may have missed. For example, if I have a webserver log and the number of requests spikes or there is an increase in the number of 400 errors, machine learning could be used to notify you of these events before they have a serious impact on your infrastructure.

Taken further, machine learning could be used to find relationships between logs, such as tracing a request through multiple servers without explicitly knowing the path of the request beforehand. Given additional contextual inputs, like git commits, deployments, continuous integration, and bug tracking, machine learning could even be used to find relationships between code changes and log statements. Imagine being automatically notified that a particular set of commits is likely responsible for the errors you’re seeing in production. Pretty mind blowing.

So why hasn’t this been done already? Even though finding general relationships between entities isn’t hugely difficult, discerning useful and meaningful insights from unstructured data is actually pretty challenging. How do you find useful relationships in large unstructured datasets consisting primarily of background noise? The answer, however unsexy, is classification and training.

As evidenced by both the Netflix and Google Now examples, human recommendation is key to making machine learning insights actually worthwhile, hence the ‘learning’ part of the name. While it may seem like this initial recommendation effort detracts from the promised convenience of machine learning, it is actually what makes machine learning work so well. Instead of hunting down the data and finding patterns yourself, you are prompted to verify that the insights generated are helpful and correct. As more and more choices are made by us humans, the more useful these machine learning insights become, and the fewer verification prompts we receive. This is the point at which machine learning has fulfilled its highest potential.

Moving forward

From PaaS and containers, to machine learning and big data, we keep all these things in mind as we improve LogDNA. Like machine learning, we also rely on recommendation and feedback from our customers. Without it, our product would not be where it is today.

What do you think is the future of logging? Let us know at feedback@logdna.com!