The Impact of Containerization on DevOps

Ever since its formal introduction in 2008, DevOps has helped organizations shorten the distance from development to production. Software teams are delivering changes faster without sacrificing quality, stability, or availability. To do this, developers and IT operations staff have started working closely together to create a faster, more efficient pipeline to production.

Around the same time, containers started transforming the way we develop and deploy applications. 20% of global organizations today run containerized applications in production, and more than 50% will do so by 2020. But how has the growth of containers impacted DevOps, and where will that lead us in the future?

Thinking Differently About Applications

Not too long ago, applications were deployed as large, bulky, monolithic packages. A single build contained everything needed to run the entire application, which meant that changing even a single function required a completely new build. This had a huge impact on operations, since upgrading an application meant having to stop the application, replace it with the new version, and start it up again. Virtual machines and load balancing services removed some of the pain, but the inefficiency of managing multiple virtual machines and dedicated servers made it difficult to push new features quickly.

Containers allowed us to split up these monolithic applications into discrete and independent units. Like VMs, containers provide a complete environment for software to run independently of the host environment. But unlike VMs, containers are short-lived and immutable. Instead of thinking of containers as lightweight VMs, we need to think of them more as heavy processes. For developers, this means breaking down monolithic applications into modular units. Functions in the original monolith are now treated as services to be called upon by other services. Instead of a single massive codebase, you now have multiple smaller codebases specific to each service. This allows for much greater flexibility in building, testing, and deploying code changes.

This idea of breaking up applications into discrete services has its own name: microservices. In a microservice architecture, multiple specialized services work together to perform a bigger service. The kicker is that any one service can be stopped, swapped out, restarted, and upgraded without affecting any of the other services. One developer can fix a bug in one service while another developer adds a feature to another service, and both can occur simultaneously with no perceivable downtime. Containers are the perfect vessel for microservices, as they provide the framework to deploy, manage, and network these individual services.

Giving Power to Developers

Containers also empower developers to choose their own tools. In the old days, decisions about an application’s language, dependencies, and libraries had far-reaching effects on both development and operations. Test and production environments would need to have the required software installed, configured, and fully tested before the application could be deployed. With containers, the required software travels with the application itself, giving developers more power to choose how they run their applications.

Of course, this doesn’t mean developers can just pick any language or platform for their containers. They still need to consider what the container is being used for, what libraries it requires, and how long it will take to onboard other developers. But with the flexibility and ease of replacing containers, the impact of this decision is much less significant than it would be for a full-scale application.

Streamlining Deployments

Containers don’t just make developers’ lives easier. With containers providing the software needed to run applications, operators can focus on providing a platform for the containers to run on. Orchestration tools like Docker Swarm and Kubernetes have made this process even easier by helping operators manage their container infrastructure as code. Operators can simply declare what the final deployment should look like, and the orchestrator automatically handles deploying, networking, scaling, and mirroring the containers.

Orchestration tools have also become a significant part of continuous integration and continuous deployment (CI/CD). Once a new base image has been built, a CI/CD tool like Jenkins can easily call an orchestration tool to deploy the container or replace existing containers with new versions generated from the image. The lightweight and reproducible nature of containers makes them much faster to build, test and deploy in an automated way than even virtual machines.

When combined, CI/CD and container orchestration means fast, automated deployments to distributed environments with almost no need for manual input. Developers check in code, CI/CD tools compile the code into a new image and perform automated tests, and orchestration tools seamlessly deploy containers based on the new image. Except for environment-specific details such as database connection strings and service credentials, the code running in production is identical to the code running on the developer’s machine. This is how companies like Etsy manage to deploy to production up to 50 times per day.

Security From the Start

Despite the recent publicity, volume, and scale of data breaches in IT systems, security is often treated as an afterthought. Only 18% of IT professionals treat security as a top priority for development, and 36% of IT organizations don’t budget enough for security. Surprisingly though, most feel security would be beneficial development: 43% believe fixing flaws during development is easier than patching in changes later, and only 17% believe that adding security to the development process will slow down DevOps.

Combining security and DevOps is what led to DevSecOps. The goal of DevSecOps is to shift security left in the software development lifecycle from the deployment phase to the development phase, making it a priority from the start. Of course, this means getting developers on-board with developing for security in addition to functionality and speed. In 2017, only 10 to 15% of organizations using DevOps managed to accomplish this.

“Security is seen as the traditional firewall to innovation and often has a negative connotation. With shifting security left, it’s about helping build stuff that’s innovative and also secure.”
– Daniel Cuthbert, Global Head of Cyber Security Research at Banco Santander (Source)

As DevSecOps becomes more ingrained in development culture, developers won’t have a choice but to embrace security. The immutable nature of containers make them impractical to patch after deployment, and while operations should continue monitoring for vulnerabilities, the responsibility for fixing for these vulnerabilities will still fall to developers. The good news is that with containers, vulnerabilities can be fixed, built, tested, and re-deployed up to 50% faster than with traditional application development methods.

Moving Forward

Although DevOps and containerization are still fairly new concepts, they’ve already sparked a revolution in software development. As the tools and technologies continue to mature, we’ll start to see more companies using containers to build, change, test, and push new software faster.


The Impact Of Kubernetes On The CI/CD Ecosystem

Continuous integration (CI) and continuous delivery (CD) are two sides of the same coin we call DevOps. Continuous deployment is the last step of the CD phase where the application is deployed into production. Emerging in the early part of this decade, CI and CD are not new terms to anyone who’s been managing application delivery for a while. Tools like Jenkins have done much to define what a CI/CD pipeline should look like. The key word when talking about CI/CD is ‘automation’. By automating various steps in the development cycle, we start to see improvements in speed, precision, repeatability, and quality. This is done by build automation, test automation, and release automation. Much can be said about each of these phases, as they involve new approaches that are contrary to the old waterfall approach to software delivery. To make this automation possible, it takes multiple tools working together in a deeply integrated manner.

Container ShipSource: Wikimedia.org

How Kubernetes Affects CI/CD Pipelines

While CI/CD is not new, the advent of Docker containers has left no stone unturned in the world of software. More recently, the rise of Kubernetes within the container ecosystem has impacted the CI/CD process. DevOps requires a shift from the traditional waterfall model of development to a more modern and agile development methodology.

Rather than moving code between various VMs in different environments, the same code is now moved across containers, or container clusters as is the case with Kubernetes. Unlike static VMs that are suitable to a more monolithic style of application architecture, containers require a distributed microservices model. This brings new opportunities in terms of elasticity, high availability, and resource utilization. However, rather than relying on old approaches and tools to achieve these advantages, they call for change.  

Jenkins – Build & Test Automation

Mention continuous integration, and Jenkins is the first tool that comes to mind. In recent years, Jenkins has focused on going beyond CI and handling the end-to-end development pipeline including the CD phases. Kubernetes has been the solution to this effort. With Kubernetes’ mature handling of resources in production, it’s just the partner Jenkins needs to extend its reach beyond CI.

Running Jenkins on Kubernetes brings many benefits. To start, Jenkins can take advantage of the scalability and high availability of Kubernetes. With the numerous worker nodes in Jenkins, handling infrastructure to run Jenkins can become a nightmare. Kubernetes makes this easier with its automatic pod management features.

Further, Kubernetes enables zero-downtime updates with Jenkins. This is made possible by the rolling updates feature of Kubernetes where it gradually phases out pods with an older version of the application and replaces them with new ones. It does this by keeping a watch on the number of ‘maxUnavailable’ pods and ensuring they’re enough to run the application at all times during the update. In this way, Kubernetes brings the ability to do canary releases and blue-green deployments to Jenkins.

Apart from Jenkins, there are also many new CI tools that are built from a container-first standpoint. These include CircleCI, Travis, CodeFresh, Drone, and Wercker. Many of these tools provide a simpler user experience than Jenkins and are fully managed SaaS solutions. Almost all of them encourage a ‘pipeline’ model to deploying software, and in doing so bring greater control and flexibility in how you manage the CI process. They also feature integrations with all major cloud providers, making them a great alternative to the industry-leading Jenkins.  

Spinnaker – Multi-Cloud Deployment

While Jenkins is perfect for the build stages of the pipeline, perhaps an even more complex problem to solve is deployments, especially when it involves multiple cloud platforms and mature deployment practices. Kubernetes has a deployment API which has support for rollout, rollback, and other core deployment functionality. However, another open source tool, Spinnaker, create by Netflix, has been in the spotlight for its advanced deployment controls.

Spinnaker focuses on the last mile of the delivery pipeline – deployment in the cloud. It automates deployment processes and cloud resources and acts as a bridge between the source code on say, Github, and the deployment target like a cloud platform. The best part is that Spinnaker supports multiple cloud platforms, and enables a multi-cloud approach to infrastructure. This is one of the original promises of Kubernetes, and is being made accessible to all by Spinnaker.

Spinnaker uses pipelines to allow users to control a deployment. These pipelines are deeply customizable. It automatically replaces unhealthy VMs in a cloud platform so you can focus more on defining the required resources for your applications than on maintaining those resources.

Spinnaker isn’t a one-stop-solution. In fact, it leverages Jenkins behind the scenes to manage builds, and is built on top of the Kubernetes deployment API adding advanced functionality of its own at every stage. For example, while rollbacks are possible with the Kubernetes API, they’re much faster and easier to execute with Spinnaker. Considering its focus is deployment, it’s no surprise that Spinnaker has first class support for operations like canary releases and blue-green deployments.

While Jenkins excels at build automation and initiating automated tests, Spinnaker complements it well by enabling complex deployment strategies. Which tool you choose will depend on your circumstances. Teams that are deeply invested in Jenkins may find it easier to simply better manage Jenkins using Kubernetes. Teams that are looking for a better and easier way to handle deployments than Jenkins would want to give Spinnaker a spin. Either way, Kubernetes will play a role in ensuring that CI/CD pipelines function seamlessly.

A CI/CD pipeline management tool is essential as it acts as the control pane for your operations. However, it’s not the only tool you’ll use.

Helm – Package Management

Helm is a package manager for Kubernetes that makes it easy to install applications in Kubernetes. With automation being key to successful CI/CD pipelines, it’s essential to be able to quickly package, share and install application code and its dependencies. Helm has a collection of ‘charts’ with each chart being a package that you can install in Kubernetes. Helm places an agent called Tiller within the Kubernetes cluster which interacts with the Kubernetes API and handles installing and managing of packages.

The biggest advantage of Helm is that it brings predictability and repeatability to the CI/CD pipeline. It lets you define and add extensive configurations and metadata for every deployment. Further, it gives you complete control over rollbacks and brings deep visibility into every stage of a deployment.

Trends in CI/CD

Kubernetes is changing CI/CD for the better. By enabling and transforming tools like Jenkins, Spinnaker, and Helm, Kubernetes is ushering in a new way to deploy applications. While the ideas for doing canary releases and blue-green deployments have been around for over a decade, they’re made truly possible with the advances that Kubernetes brings. Here are some of the trends that are emerging because of the influence of Kubernetes.


All CI/CD tools today look at the software delivery cycle as a pipeline with linear steps from start to end. However, pipelines aren’t straightforward and allow for complex changes at every step. The biggest benefit is the ability to abstract the entire process and make it easier to manage. The pipeline model allows a view into how each component depends on others, and view every step in context of the other steps. Previously, each step was disconnected from the other and silos were the norm. Today, with CI/CD tools and Kubernetes, pipelines aren’t just on paper; rather, they are how software delivery happens in practice.

Configuration As Code

Infrastructure previously was controlled by its parent platform. VMware dictated how you interact with VMs and every change has to be made manually, separate from other changes. Today, with tools like Spinnaker and Helm, infrastructure is configured and managed via YAML files. This doesn’t just ease creation of resources but enables better troubleshooting.

Visibility & Control

Previously, version control was restricted to certain parts of the pipeline, but with Kubernetes, version control is built-in as every change is recorded and versioned and can be retrieved or rolled back to if needed.

Speaking about visibility, monitoring becomes more comprehensive with capable tools that are built to easily handle the scale and nuances of a Kubernetes-driven process. Tools like Prometheus and Heapster are great at delivering a stream of real-time metrics. Additionally, logging tools like LogDNA help to capture the minute details about every deployment. This includes logging exceptions, errors, states, and events.

Multi-Cloud Support

CI/CD tools today need to support multiple cloud platforms. Not that the same app would be run on multiple cloud platforms, but in a large organization different teams and different apps would use various platforms to meet specific needs. A modern CI/CD tool needs to cater to the needs of diverse teams and applications and this means supporting all major cloud platforms and private data centers as well.


Kubernetes has changed how software is built and shipped. What began with the cloud computing movement and CI tools like Jenkins about a decade ago is now coming of age with Kubernetes. What’s amazing is that these technologies are not being adopted by startups or fringe organizations, but rather by mainstream large enterprises who are looking for a way to modernize their application stack and infrastructure stack. They’re looking for solutions to real-world problems they face. If these CI/CD solutions tell us anything, it’s that Kubernetes is delivering where it really matters, and this is making CI/CD become a reality in many organizations. It’s about time, after all.

ProTip, Technical

Log Alerts & Monitoring – Best Practices

Application logs are more than just tools for root cause analysis. They’re also a way to gain insight into critical events such as a loss in sales, server warnings, HTTP errors, performance, and numerous other activities that impact business productivity. Logs are just thousands of lines of raw data, but they can be parsed and leveraged to provide a better understanding of what goes on under the hood of your application. It’s common for developers to set up logging for code exceptions within the application, but logging provides far more benefits than just bug tracking and should be used to alert administrators to issues that need their attention.

Application and Server Performance

Any administrator who has dealt with performance issues will tell you that it’s one of the most difficult problems to analyze and pinpoint a root cause for repair. Performance degradation can occur at certain times of the day, during an active DDoS attack, or what seems like no reason at all. QA can do performance testing on your application, but these tests rarely represent a production environment that supports thousands of users concurrently. For most organizations, performance issues occur during business growth and can harm its potential expansion. Performance is also problematic because it’s unforeseen and rarely a quick fix for developers and administrators.

Using log alerts, you can assess what’s happening when an application’s performance is diminished. It could be from a specific event such as a poorly optimized database query or when CPU usage spikes. Log these types of events, and you can not only identify when application performance will wane but also when server resources could be exhausted. Instead of suffering from a server crash, logging these events will give you insights for when it could be time to upgrade server hardware. They also help you pinpoint components of your application that could be causing performance degradation.

Your developers would need to set a baseline, but — for instance — you could set an alert for any event that takes longer than 100 milliseconds to process. When you see a pattern, you can then have developers research more into these application components for better optimization.

When CPU usage spikes to over 80%, set an alert to inform administrators. It could be something as simple as upgrading RAM or even your server’s CPU, but having these log alerts will give you the ability to analyze the time of day and any patterns surrounding application procedures.

Failed Sales Events

Most applications log exceptions so that developers can look back into errors and provide updates. But not every exception is created equally. The most critical events are those that impact business revenue. These events are the ones you should monitor closely and send alerts to the right response team.

It shouldn’t be your customers calling to tell you that there are bugs in your application. You want to find them before they do, and many customers will just bounce to another vendor if you have too many bugs. Many of your customers won’t report application issues at all, so you could be losing sales every hour and never know it.

When you build a shopping cart, you should have a point where you ask a visitor for a way to contact them. Usually, email is a common input field during account creation. With an alert on failed shopping cart activity, you have a way to contact a customer should they receive an error and bail on the shopping experience. Alerts are a great tool to salvage a lost customer due to application bugs.

But you also need alerts to tell you when components of your application are creating obstacles for customer shopping experiences. It could be an alert for performance (similar to the previous example), or your customers could be dropping at a specific point in the sales funnel. Alerts give you insights into the efficacy of your marketing and user experience. Use them generously to identify issues with your sales pages and find solutions for a better user experience.

Security and Suspicious Activity

Developers are good at logging exceptions, but they don’t usually account for security events. Security events can be any suspicious behaviors such as automated logins from account takeover tools (ATOs) using customer data, repeated patterns of failed admin login attempts, ACL changes or new accounts created.

Usually, some security events are triggered from the database, but this limits your logging to database-specific activity. With logging and alerts, you should use them to make the right people aware of suspicious activity that happens on any one of your servers outside of database activity.

With ATOs, an attacker will use automated software to log into a customer’s account and use purchased credit card data to buy product to test if the card is viable. Logs should be used to detect this type of activity and alert administrators to suspicious events.

Any modifications to security permissions or authorization should also be logged and alerts sent. This could be elevation of permissions for any specific user, new routing rules configured on your infrastructure, or user access to critical files. Security events are a primary method organizations use to identify attacks before they become catastrophic data breaches.

How Do You Set Up Alerts?

You need the right data before you can set up alerts. The data that you log is up to you, but some standard data points are needed to create efficient logs that scale across your infrastructure. Some common data points include:

  • Date and time
  • Host name
  • Application name
  • Customer or account that experienced the error
  • IP address or other geographic markers for the visitor
  • Raw exception information
  • Line number where it occurred (if applicable)
  • Type of error (fatal, warning, etc)

You could write a logging application, or you could save time and configuration hassles by using LogDNA. With built-in features that provide logging data and sending alerts, you can save months of development and testing for your own solution.

Instead of only using logs for basic events, an organization’s best practices should include activity that gives administrators insight into patching issues before they become catastrophic instead of just using them to retroactively find solutions. LogDNA can provide you with the right tools and alerts that organizations can leverage to avoid revenue-impacting bugs and server errors.

Product Updates

How to Determine Log Management ROI

Adopting a new toolset in a tech company can be the added catalyst your team needs to increase productivity. While you know that bringing a new tool (like cloud logging) will help your organization, you also need to convince either your superiors or financial managers by answering some questions. How will log management benefit us? How do we determine the return on investment? How do we make a business case out of it?

We’re going to look at the specific case of log analysis management services — how to understand your return on investment (ROI), how to determine cost analysis and some general tips and tricks for additional qualitative returns.   

If you have any kind of operations in place and in production, you’ll definitely be generating a lot of logs. Then of course you’ll need to monitor and analyze logs in one way or another. After ingesting them comes the next step of analysis. The question is how do you determine ROI for yourself, your superiors, and your organization?

Breaking Down ROI

The basics of ROI have their fundamentals set in the business and financial world. It is a performance measure that is used to evaluate how efficient an investment is compared to a number of different investments. In our case, the investment is a cloud log management tool. In a nutshell, it is the return on an investment relative to your initial upfront costs.  

The return on investment formula can be seen in the following equation:
ROI = (Gain from Investment – Cost of Investment) / Cost of Investment

The idea is pretty simple. If you’re going to be investing in a service, will it bring more value than what you paid? And by how much? This handy formula can be utilized as a ratio or a percentage. If the formula gives you a positive value, then you received positive value, and it was a good decision. If not, you need to reevaluate your spend.

So, for cloud logging tools, the question is — will your organization realize enough value (cost savings, time savings, additional revenue, etc.) to justify the cost?

First, let’s employ some quick math as an example before we head into the costs of a logging management product. Finance and team leaders want to see some objective hard data on why they should employ a new system.

Here’s a simple example.  Let’s say you’ll be spending $100 /mo on a SaaS product that saves one engineer one hour per week.  If that engineer’s fully loaded pay (salary, bonus, payroll tax, benefits, etc.) is $75 per hour, it saves the company $300 per month (assuming four weeks per month).  

Going back to our ROI formula:
ROI = (Gain from Investment – Cost of Investment) / Cost of Investment
ROI = ($300 – $100) / $100
ROI = 2 (often expressed as 2x, or 200%)

These sorts of arguments form the basis of your business case and subsequent ROI calculations.   

Cost of Log Analysis

In order to make a business case and set a projection of ROI, you’ll need two important pieces of data. The cost (upfront or ongoing investment you’re looking to get a return on) and the potential revenue or savings benefit. Here’s a look at logging management costs to start.

In our modern business era, the majority of SaaS and technology companies offer their services usually in a recurring monthly subscription. LogDNA does this in its “pay as you grow” format. Once starting out your business case, you’ll need to take this into account, as your ROI can be determined on a fluctuating monthly basis — but with an estimate of your monthly logging volume and required retention, you’ll have a good estimate of your monthly cost.

There may also be other costs lurking about that you may need to address. Depending on your business or potential enterprise needs – you may need to shift resources within your IT department to full or part time log management (this is especially the case when utilizing the ELK stack). We at LogDNA ensure to minimize the initial setup costs — with installation through a ~5-minute installation process; and with onboarding through our natural language search. As seen with one of our customers at Open Listing:

“We push hundreds of Gigs of data per month into LogDNA and their “pay as you grow” pricing structure works perfectly for us. It acts as a major cost savings vs. storing all that on our own infrastructure (on-premise). Additionally, there is less to manage, and we don’t have to have someone managing all the data that is moving around. We are thrilled to have LogDNA take that data and make it useable for us quickly, on the cloud. Regarding integrations, all of the data we push goes into LogDNA using AWS cloud, which we are already on, so integration was a snap.”

Once you establish your costs, you can start to look into the benefits and further ways to maximize your returns.

Proactive Logging for Better Returns

There are two main questions you need to ask yourself about your log management to understand some pricing predictions.

  1. How much of your data will be logged?

If you are already producing logs and/or utilizing a logging SaaS provider, you should already have an answer to this.  If not, you’ll need to make an estimate (and don’t worry, you can always change this later).

Your log files provide a play by play history of what your software is doing while in production. In both regulated and non-regulated companies alike, making a decision on what you need to log is an important step in determining costs. Meaningful events like regular maintenance settings, sales data and important alerts should all be factored in.

  1. What will your retention period be?

The retention period is how long the log data will be held on the provider’s servers before it is deleted.  Keep in mind any regulations that your business operates under, and what you need to remain compliant. An example is any health administration or business that has to uphold HIPAA.

It’s grown increasingly more important for healthcare professionals and business partners alike to maintain HIPAA compliance indefinitely. Log files (where healthcare data exists) must be collected, protected, stored and ready to be audited at all times. A data breach can end up costing a company millions of dollars.

Qualitative Return on Investment

A great log management toolset offers numerous benefits. One is that you’ll be able to search through logs quickly and pinpoint production issues faster, which saves the engineering team time.  Another is that the data can be put up in a visual dataset for others in the team to look at and collaborate on. Another way to save time.

This is another great aspect of building up a business case for log management ROI. How often do you waste time sludging through logs looking for what went wrong? What if you could significantly reduce time spent by letting the technology do the work for you? Not to mention the benefits it gives your many users and customers.  

You’re definitely increasing your savings by reducing time spent fixing issues, but what about preemptively stopping problems altogether? With a sophisticated cloud logging system in place, you can find problems and reduce problems (such as downtime). Are you experiencing random traffic spikes? Are there a number of similar bug alerts at once? You can look for trends in your log files and determine the best course of action before something becomes a major problem.  In the case of reducing downtime, you’ll avoid lost traffic, lost sales and decreased reputation.

One overlooked aspect of the right cloud logging system is that you can also use your logs to make your company more money. There are troves of data created by your app that you can use for valuable insights into your customer behavior. Opportunities are everywhere that help you take certain metrics and apply those to new business initiatives.

Maybe you see that signups on your app are more prevalent during a certain time of the week. You can research the market conditions around this and try to learn how to replicate this during a slow period. The possibilities are endless.   

While this is difficult to quantify at first, you can apply the formula and sort out some trial ROI runs in the first months of using a logging platform.

Bringing it all Together

By now you can tell that estimating log management costs and ROI are dependent on many diverse factors and will ultimately come down to you determining and setting up your own metrics. It’s not going to be as simple as calculating your site’s uptime. But that doesn’t mean you shouldn’t be able to calculate it.

Just going through the act of creating a business case and determining the effect a technical tool like a cloud logger will have on your bottom line is worth the time spent.  The great thing is that you can get started for free and begin to understand your ROI without any worry of wasted spend. And after that, you can factor in the monthly cost. Even just a few hours per month saved will be a tremendous boost of productivity to your DevOps team and overall organization. So go ahead and get your ROI hats on, start deliberating and get the tools you need to keep you productive.

Product Updates

Achieving Cost Advantages With a Cloud Logging Management Solution

In the past few years, cloud based services have seen tremendous growth. This fast adoption can be attributed to their low cost of acquisition, ease of implementation and economies of scale. Everyday businesses and enterprises of all sizes are making the switch from on-site infrastructures to both hybrid and completely cloud-based log management solutions.

Determining cloud logging cost advantages can be tricky at first. There is no one uniform way to determine how much you’re saving. An IT infrastructure is a complex beast with many different areas contributing to your total cost of ownership (TCO).

But we do know that switching to a cloud based log management system will save you money in the long run. There’s a few different ways to determine your cost advantages. First, you need to evaluate what type of solution will best fit your current needs and also be flexible enough to keep up with your future growth. You’ll also need to know how to determine your unique TCO and how that affects your DevOps and engineering teams.   

General Cost Advantages of Cloud Logging

For the most part, your monthly expenses from cloud logging is based off of log data volume and your unique retention rate. This holds true whether it’s LogDNA or another logging vendor. But there’s a few other things that can be overlooked that cloud logging also helps cut costs from.

An integrated system gets rid of the need to have separate instances for security and access controls and will also seamlessly interact with popular DevOps tools like Pagerduty or JIRA. These are just a few examples, but even those can add up. Cloud logging systems can also scale to handle spikes in seasonal log variations. It creates a redundant system that always makes your logs available, even if your infrastructure is down – the time when you’ll need your logs the most!

Additionally, cloud logging alleviates the need to rely on open source solutions that require you to hire engineering support to set up, manage and deploy systems. A cloud logging solution just requires a simple command line install. Index management, configuration and access control are just a click away in this case. This ease of use frees up your man hours for your DevOps team. With a log management system in place, your team can instead focus on core business operations.

Cost advantages are difficult to quantify in this case, as they will vary from business to business and require an internal accounting system that takes into account inventory costs, engineer support salaries, and opportunity costs saved.  

But we still can look to the general cloud market to see how companies are utilizing cloud-based systems and cloud-logging platforms to cut costs.

Look to the Greater Cloud Market For Answers

Arriving at a clear TCO is a difficult number to compute. SaaS and on premise IT costs are not exactly the easiest things to compare. Internal operational costs with employees will always vary and fluctuate. Just because you’re applying some cloud services into the mix, doesn’t mean you’re going to be cutting off the entire on-premise IT staff. Their roles are changing and evolving by the minute – but they’re also not going anywhere anytime soon.

We look to a compelling study conducted by Hurwitz & Associates comparing cloud based business applications to on-premise solutions. This white paper found that the overall cost advantages for cloud based solutions was significantly greater than on-site IT infrastructure.

Here’s a look at some of the areas of cost that are averted when working with a cloud logging management system.   

Cost Advantages in Focus

  • Setting up IT infrastructure – which includes hardware, software and general ongoing maintenance accounts for around 10% of the total cost of setting up an on-premise solution.
  • Subscription type fees, in which the majority of cloud logging providers offer is the main area of cost. Under these costs is the fact that you don’t have to create an underlying IT infrastructure – for example if you wanted run an ELK stack. That also means you cut down on personnel costs too.
  • A pre-integrated system for both the front and back end functionality of your business reduces disparate integration complexity and lowers implementation costs.

These three examples are just a few of the reasons more businesses are shifting their focus to gain additional cost advantages.

Global Spending Shifted Toward the Cloud

According to the IDC, spending on cloud services is expected to hit $160 billion in 2018 with a 23% increase from 2017. Software as a service (SaaS) is the largest category, accounting for over two thirds of spending for the year. Following that is infrastructure as a service (IaaS). Resource management and cloud logging make up the greatest amount of spending of the SaaS spend this year.  

The United States accounts for the largest market share of cloud services – totalling over $97 billion, followed by the United Kingdom and Germany. Japan and China are the largest in Asia with roughly $10 billion combined. There is a wide range of industries that benefit from cloud logging, from professional services to banking to general applications. Many of these businesses would be better off streamlining and integrating a pre-existing cloud logging solution rather than creating their own or wasting precious resources hiring and maintaining an on-premise IT staff.   

What Cloud Logging Helps Eliminate or Streamline

The first area to go is the operational costs in hiring additional engineers. Let’s use running an ELK stack as our prime example moving forward. Cloud logging platforms have cost advantages in three main areas: parsing, infrastructure and storage.

First, it’s one thing to be able to grab logs and get them churning through the stack – it’s a different ballgame entirely to actually make meaning out of them. While trying to understand and analyze your data, you need to be able to structure it so that it can be read and make sense. Parsing and putting it into a visual medium allows you make actionable decisions on this ever-changing and flowing data.  

The ability to use Logstash to filter your logs in a coherent way unique to your business needs is no easy feat. It can be incredibly time consuming and require a lot of specialized billable hours. A quick Google search will show you the mass amounts of queries into creating just a Logstash timestamp – something that’s already ingrained and part of a cloud logging platform. Logs are also very dynamic. Which means that over time you’re going to be dealing with different formats and you’ll need to initiate periodic configuration adjustments. All of this means more time and money spent just getting your logs functional. You shouldn’t have to reinvent the wheel just to be able to read your logs.

Next is just plain infrastructure. As your business grows — which is what any viable business is hoping and striving for – more logs are going to be ingested into your stack. This means more hardware, more servers, network usage and of course storage. The overall amount of resources you need to employ to process this traffic will be continually increasing. An in-house log management solution consumes a lot of network bandwidth, storage and disc space. Not to mention, it most likely won’t be able handle large bursts of data when you have spikes in logs.

When an error occurs in production is when you’ll need your precious logs parsed, ingested and ready for action in a moments notice. If your infrastructure isn’t up to snuff and falters – not only will you not be able to investigate your logs, you’ll also spend money fixing your failed underlying systems. Building out and maintaining this infrastructure can cost tens of thousands of dollars on an annual basis.  

Finally, all of your data has to go somewhere. You need to know where it goes and what to do with it. Indices can pile up and if they’re not taken care of, there’s a possibility that this will cause your ELK stack to crash and you’ll lose that precious data. A few things you’ll need to also learn how to do is remove old indices and have logs archived in their original format. All of this can be done with Amazon A3, but costs more time and money.

Flexible Storage & Pricing

In terms of storage, cloud logging ensures that you can store and have flexible data retention at a fraction of the cost it’d cost you to host it locally. Pricing is flexible and most important scalable. These two characteristics make cloud logging cost effective for any kind of business.   

LogDNA’s pay-per-GB pricing (similar that of AWS) is a good example of scalability. When you have an in-house solution, you need to increase your hardware every time your data increases. And being in the business of growth, predicting scalability is tough.  A pay-as-you-grow pricing model allows you to bypass wasted cloud spend and only pay for what you need. Finding the perfect balance is more difficult the other way around.

Overall, these many benefits and an overarching trend of companies shifting towards cloud logging solutions shows that there are multiple cost advantages with these solutions. Determining just how much you’ll save from a TCO standpoint depends on your unique situation and configuration — just be sure to think through hiring, maintenance and hardware.

Security, Technical

Building Secure Applications with Machine Learning & Log Data

In recent years, machine learning has swept across the world of software delivery and is changing the way applications are built, shipped, monitored, and secured.  And log monitoring is one of the industries that keeps evolving with new capabilities afforded with machine learning.

What Is Machine Learning?

Machine learning is the process of using algorithms and computer intelligence to analyze and make sense of large quantities of complex data that would otherwise be difficult or impossible to do by a human security analyst. There are many forms of machine learning from algorithms that can be trained to replicate human decision making at scale to algorithms that take an open-ended approach to find interesting pieces of data with little input or guidance. Differences aside, machine learning looks to get more value out of data in a way that humans can’t do manually. This is critical for security, which deals with large quantities of data, and often misses the important data, or catches it too late.

Machine learning is made possible by the power of cloud computing and how it makes crunching big data cheaper and more powerful. Analyzing large quantities of complex data takes a lot of computing power, readily available memory, and fast networking that’s optimized for scale. Cloud vendors today provide GPU (graphical processing unit) instances that are particularly well suited for machine learning. The alternate method is to use numerous cheap servers and a distributed approach to analyze the data at scale. This is possible with the advances in distributed computing over the past few years.

Additionally, cloud storage with fast I/O speeds is necessary for complex queries to be executed within a short period of time.  AWS itself offers multiple storage solutions like Amazon EFS, EBS, and S3. Each of these serve different purposes and are ideal for different types of data workloads. The cloud has stepped up in terms of compute, memory, storage and overall tooling available to support machine learning.

Security Threats Are Becoming Increasingly Complex

The primary use case for machine learning in log analysis has to do with security. There are many forms of security attacks today, which range in complexity. Email phishing attacks, promo code abuse, credit card theft, account takeover, and data breaches are some of the security risks that log analysis can help protect against. According to the Nilson Report, these types of security attacks cost organizations a whopping $21.8 billion each year. And that doesn’t even include the intangible costs associated with losses in trust, customer relationships, brand value, and more that stem from a security attack. Security threats are real, and log analysis can and should be used to counter them.

Attackers are becoming more sophisticated as they look to new technology and tools to carry out their attacks. Indeed, criminals themselves are early adopters of big data, automation tools and more. They use masking software to hide their tracks, bots to conduct attacks at scale, and in some cases have armies of humans assisting in the attack. With such a coordinated effort, they can easily break through the weak defenses of most organizations. That’s why you see some of the biggest web companies, the most highly secured government institutions, and the fastest growing startups all fall victim to these attackers, who can breach their defense easily.

SecOps Is Predominantly Manual

Since much of security is conducted by humans, or at most by tools that use rules to authorize or restrict access, attackers eventually understand the rules and find ways to breach them. They may find a limit to the number of requests a gatekeeper tool can handle per second, and then bombard the tool with more requests than the limit. Or they may find unsecured IoT devices on the edge of the network that can be easily compromised and taken control of. This was the case with the famous Dyn DDoS attack which leveraged unsecured DVRs and then bombarded the Dyn network, taking down with it a large percentage of the internet’s top websites that relied on Dyn for DNS services. The point is that manual security reviews, and even rule-based security, doesn’t scale and is not enough to secure systems against the most sophisticated attackers. What’s needed is a machine learning approach to security, and one that leverages logs to detect and stop attacks before they escalate.


Many security risks occur at the periphery of the system, so it’s essential to keep a close watch on all possible entry points. End users access the system from outside, and can sometimes knowingly or unknowingly compromise the system. You should be able to spot a malicious user from the smallest of triggers; for example, an IP address or a geo location that is known to be suspicious should be investigated. Login attempts are another giveaway that your system may be under attack. Frequent unsuccessful login attempts are a bad sign and need to be further investigated. Access logs need to be watched closely to spot these triggers, and it is best done by a machine learning algorithm.

While manual and rule-based review can work to a certain point, increasingly sophisticated attacks are best thwarted by using machine learning. You may need to crawl external third-party data to identify fraudulent activity, and it helps to look not just inside, but outside of your organization for data that can shed light on suspicious activity. But with growing data sets to analyze, you need more than basic analytics — you need the scale and power of machine learning to spot patterns, and find the needle in the haystack. Correlating your internal log data with external data sets is a challenge, but it can be done with machine learning algorithms that look for patterns in large quantities of unstructured data.

Machine Learning For Security

Machine learning can go further in spotting suspicious patterns from multiple pieces of data. It can look at two different pieces of data, sometimes not obviously associated with each other, and highlight a meaningful pattern. For example, if a new user accesses parts of the system that are sensitive, or tries to gain access to confidential data, a machine learning algorithm can spot this from looking at their browsing patterns or the requests they make. It can decipher that this user is likely looking to breach the system and may be dangerous. Highlighting this behavior an hour or two in advance can potentially prevent the breach from occurring. To do this, machine learning needs to look at the logs showing how the user accesses and moves through the application. The devil is in the details, and logs contain the details. But often, the details are so hidden that human eyes can’t spot them; this is where machine learning can step in and augment what’s missing in a human review.

In today’s cloud-native environment, applications are deeply integrated with each other — no application is an island on its own. This being the case, many attacks occur from neighboring apps which may have escalated their privileges. It’s easy to read the news about data breaches and find cases where organizations blame their partner organizations or an integrated third-party app for a security disaster. Monitoring your own system is hard enough, and it takes much more effort and sophistication to monitor outside applications that interact with yours.

Whereas humans may overlook the details when monitoring a large number of integrated applications and APIs, a machine learning algorithm can monitor every API call log, every network request that’s logged, and every kind of resource accessed by third-party applications. It can identify normal patterns as well as suspicious ones. For example, if an application utilizes a large percentage of available memory and compute for a long period of time, it is a clear trigger. A human may notice this after a few minutes or hours of it occurring, but a machine learning algorithm can spot the anomaly in the first few seconds, and bring it to your attention. Similarly, it can highlight a spike in requests from any single application quickly, and highlight that this may be a threat.


Machine learning algorithms are especially good at analyzing unstructured or semi-structured data like text documents or lines of text. Logs are full of text data that need to be analyzed, and traditional analytics tools like SQL databases are not ideally suited for log analysis. This is why newer tools like Elasticsearch have sprung up to help make sense of log data at scale. Machine learning algorithms work along with these full-text search engines to spot patterns that are suspicious or concerning. It can derive this insight from the large quantities of log data being generated by applications. In today’s containerized world, the amount of log data to be analyzed is on the rise, and only with the power of machine learning can you get the most insight in the quickest time.


As you look to get more out of your log data, you need an intelligent logging solution like LogDNA that leverages machine learning to give you insight in a proactive manner. Algorithms are more efficient and faster than humans at reading data, and they should be used to preempt attacks by identifying triggers in log data. As you assess a logging solution, do look at its machine learning features. Similarly, as you plan your logging strategy, ensure machine learning is a key component of your plans, and that you rely not just on traditional manual human review, but leverage the power of machine learning algorithms.

Security, Technical

The Role of Log Analysis in Application Security

Security is perhaps the most complex, time-sensitive, and critical aspect of IT Ops. It’s similar to the ICU (Intensive care unit) room in a hospital where the most serious of cases go, and any mistake can have far-reaching consequences. Just as doctors need to monitor various health metrics like heart rate, blood pressure, and more, Security Analysts need to monitor the health of their systems at all time – both when the system is functioning as expected, and especially when things break. To gain visibility into system health at any time, the go-to resource for any Security Analyst is their system logs. In this post, we cover all that can go wrong in the lifetime of an application, and how Security Analysts can respond to these emergencies.

The Role of Log Analysis in Application Security
Log Analysis is an integral component to application security

Common security incidents

An application can be attacked in any number of ways, but below are some of the most common type of security issues that SecOps teams deal with day in and day out.

  • Viruses & malware: This is when a malicious script or software is installed on your system and it tries to disrupt, or manipulate the functioning of your system.
  • Data breaches: The data contained in your systems is an extremely valuable asset. A data breach is when this valuable data is accessed, or stolen by an attacker.
  • Phishing attacks: This tactic is used on end users where an attacker poses as a genuine person or organization, and lures the user to click on a link after which their credentials like logins, and financial information are stolen.
  • Account takeovers: This is when a cyber criminal gains access to a user’s login credentials and misuses it to their own advantage.
  • DDoS attacks: Distributed denial of service (DDoS) attacks occur when multiple infected computers target a single server and overload it with requests. The server is overloaded with requests and denies service to other genuine users.
  • SQL injection: This happens when an attacker runs malicious SQL queries against a database with the aim of taking the database down.
  • Cross-site scripting: This type of attack involves the hacker using browser scripts to execute harmful actions on the client-side.

Initial response

When an incident occurs, the first thing you need to do is estimate the impact of the attack.

Estimate the impact of the attack. Which parts of the system have failed, which parts are experiencing latency, how many users and accounts are affected, which databases have been compromised, and more. Understanding this will help you decide what needs to be protected right away, and what your first actions should be. You can reduce the impact of an attack by acting fast.

Incident management using log data

Once you’ve taken the necessary ‘first aid’ measures, you’ll need to do deeper troubleshooting to find the origin of the attack, and fully plug all vulnerabilities. When dealing with these security risks, log data is indispensable. Let’s look at the various ways you can use log data to troubleshoot, take action, and protect yourself from these types of attacks.

Application level

When an incident occurs, you probably receive an alert from one of your monitoring systems about the issue. To find the origin of the incident, where you look first will depend on the type of issue.

At the application level you want to look at how end users have used your app. You’d look at events like user logins, and password changes. If user credentials have been compromised by a phishing attack, it’s hard to spot the attacker initially, but looking for any unusual behavior patterns can help.

Too many transactions in a short period of time by a single user, or transactions of very high value can be suspicious. Changes in shipping addresses is also a clue that an attacker may be at work.

At the application layer, you can view who has had access to your codebase. For this you could look at requests for files and data. You could filter your log data by application version, and notice if there are any older versions still in use. Older versions could be a concern as they may not include any critical security patches for known vulnerabilities. If your app is integrated with third-party applications via API, you’ll want to review how it’s been accessed.

When looking at log data it always helps to correlate session IDs with timestamps, and even compare timestamps across different pieces of log data. This gives you the full picture of what happened, how it progressed, and what the situation is now. Especially look for instances of users changing admin settings or opting for their behavior to not be tracked. These are suspicious and are clear signs of an attack. But even if you find initial signs at the application layer, you’ll need to dig deeper to the networking layer to know more about the attack.

Network level

At the network level there are many logs to view. To start with, the IP addresses and locations from where your applications were accessed from is important. Also, notice the device types like USBs, IoT devices, or embedded systems. If you notice suspicious patterns like completely new IP addresses and locations, or malfunctioning devices, you can disable them or restrict their access.

Particularly, notice if there are any breaches to the firewall, or cases of modifying or deletion of cookies. In legacy architecture a breach in a firewall can leave the entire system vulnerable to attackers, but with modern containerized setups, you can implement granular firewalls using tools like Project Calico.

Look for cases of downtimes or performance issues like timeouts across the network, these can help you assess the impact of the attack, and maybe even find the source. The networking layer is where all traffic in your system flows, and in the event of an attack, malicious events are passing through your network. Looking at these various logs will give you visibility into what is transpiring across your network.

Going a level deeper, you can look at the health of your databases for more information.

Database level

Hackers are after the data in your system. This could include financial data like credit card information, or even personally identifiable information like a social security number or home address. They could look for loosely stored passwords and usernames associated with them. All this data is stored in databases in cloud or physical disks in your system, and they need to be checked for breaches, and secured immediately if not compromised already.

To spot these breaches take a look at the log trail for data that’s been encrypted or decrypted. Especially notice how users have accessed personally identifiable information or payment related information. Changes to file names and file paths, an unusually high number of events from a single user are all signs of data breaches.

Your databases are also used to store all your log data, and you’d want to check if disk space was insufficient to store all logs, or if any logs were left out recently. Notice if the data stored on these disks were altered or tampered with.

OS level

The first thing to check in the operating system (OS) layer should be the access controls for audit logs. This is because savvy attackers will first try to delete or alter audit logs to cover their tracks. Audit logs record all user activity in the system, and if an attack is in progress, the faster you secure your audit logs the better.

Another thing to check is if timestamps are consistent across all your system logs. Often, attackers would change the timestamps for certain log files making it difficult for you to correlate one piece of log data with another. This makes investigation difficult and gives the attacker more time to carry out their agenda.

Once you’re confident that your audit logs and timestamps haven’t been tampered with, you can investigate other OS level logs to trace user activity. How users logged in and out of your system, system changes especially related to access controls, usage of admin privileges, changes in network ports, files and software that was added or removed, and commands executed recently, especially those from privileged accounts – all this information can be analyzed using your system logs.

The operating system is like the control center for your entire applications stack, and commands originating from here need to be investigated during an attack. The next place to look for vulnerabilities is the hardware devices in your system.

Hardware or infrastructure level

Some of the recent DDoS attacks, like the one that affected Dyn last year was a result of hackers gaining access to hardware devices that end users owned. They cracked the weak passwords of these devices and overloaded the server with requests. If end user devices are a major part of your system, they should be watched carefully as they can be easy targets for hackers. Looking at log data for how edge devices behave on the network can help spot these attacks. With the growth of the internet of things (IoT) and smart devices across all sectors, these types of attacks are becoming more common.

Additionally, if your servers or other hardware are also accessed by third-party vendors, or partner organizations, you need to keep an eye on how these organizations use your hardware and the data on it.

After the incident

Hopefully, all these sources of log data can help you as you troubleshoot and resolve attacks. After the incident, you’ll need to write a post-mortem. Even in this step, log data is critical to telling the entire story of the attack – its origin, progression, impact radius, resolution, and finally restoring the system back to normalcy.

As you can tell, log data is present at every stage of incident management and triaging. If you want to secure your system, pay attention to the logs you’re collecting. If you want to be equipped to deal with today’s large scale attacks, you’ll need to look to your log data. It’s no doubt logs are essential to SecOps, and understanding how to use them during an incident is an important skill to learn.