kubernetes, Technical

Kubernetes Logging 101

Containerization brings predictability and consistency across the development pipeline. A developer can package code in a container and ship the same container into production knowing it will work the same. However, for this consistent experience to happen, there are many cogs and levers working in the underlying layers of the container stack. Containers abstract away the complex internals of infrastructure and deliver a simple consistent user experience. The part of the Docker stack that’s especially important in this aspect is the orchestration layer.

An orchestration tool like Kubernetes takes care of the complexity of managing numerous containers by providing many smart defaults. It takes care of changes and configuration with groups of containers called pods, and groups of pods called clusters. In doing so, it lets you focus on what matters most to you – the code and data that’s housed in your Kubernetes cluster. Because of these advantages, Kubernetes has become the leading container orchestration tool today.

belgium-antwerp-shipping-container-163726
Source: Pexels.com

Kubernetes makes it easy to manage containers at scale, but it comes with a steep learning curve. This is the reason for the numerous startups offering managed Kubernetes services – Platform9, Kismatic, OpenShift, and CoreOS Tectonic to name a few. However, learning the ins-and-outs of Kubernetes is well worth the effort because of the power and control it gives you.

No matter which route you take to managing your Kubernetes cluster, one fundamental requirement to running a successful system is log analysis. Traditional app infrastructure required log data to troubleshoot performance issues, system failures, bugs, and attacks. With the emergence of modern infrastructure tools like Docker and Kubernetes, the importance of logs has only increased.

The Importance of Log Data in Kubernetes

Log data is essential to Kubernetes management. Kubernetes is a very dynamic platform with tons of changes happening all the time. As containers are started and stopped, and IP addresses and loads change, Kubernetes makes many minute changes to ensure services are available and performance is not impacted. But there is still the odd time when things break, or performance slows down. At those times, you need the detail that only log data can provide. Not both performance and security, you need log data to ensure proper compliance to laws like HIPAA and PCI DSS. Or, if there’s a data breach, you’ll want to go back in time to identify the origin of the attack and its progression across your system. For all these use cases, log data is indispensable.

There are many ways you can access and analyze Kubernetes log data ranging from simple to advanced. Let’s start with the simplest option and move up the chain.

Monitoring A Pod

Pod-level monitoring is the most rudimentary form of viewing Kubernetes logs. You use the kubectl commands to fetch log data for each pod individually. These logs are stored in the pod and when the pod dies, the logs die with them. They are useful when you’re just starting out, and have just a few pods. You can instantly check the health of pods without needing a robust logging setup for a big cluster.

Monitoring A Node

Logs collected for each node are stored in a JSON file. This file can get really large, and in this case, you can use the logrotate function to split the log data in multiple files once a day, or when the data reaches a particular size like 10MB. Node-level logs are more persistent than pod-level ones. Even if a pod is restarted, it’s previous logs are retained in a container. But if a pod is evicted from a node, its log data is deleted.

While pod-level and node-level logging are important concepts in Kubernetes, they aren’t meant to be real logging solutions. Rather, they act as a building block for the real solution – cluster-level logging.

Monitoring the Cluster

Kubernetes doesn’t provide a default logging mechanism for the entire cluster, but leaves this up to the user and third-party tools to figure out. One approach is to build on the node-level logging. This way, you can assign an agent to log every node and combine their output.

The default option is Stackdriver which uses a Fluentd agent and writes log output to a local file. However, you can also set it to send the same data to Google Cloud. From here you can use Google Cloud’s CLI to query the log data. This, however, is not the most powerful way to analyze your log data.

The ELK Stack

The most common way to implement cluster-level logging is to use a Fluentd agent to collect logs from the nodes, and pass them onto an external Elasticsearch cluster. The log data is stored and processed using Elasticsearch, and can be visualized using a tool like Kibana. The ELK stack (Elasticsearch, Logstash, Kibana) is the most popular open source solution for logging today, and its components often form the base for many other modern logging solutions, including LogDNA (but that’s a topic for a whole other post). The ELK stack offers more powerful logging, and more extensibility than the Stackdriver / Google Cloud option.

One example of an organization that uses this setup for centralized logging for their Kubernetes cluster is Samsung. They use the Fluentd / ELK stack combination, but add Kafka for an added step of buffering and monitoring. Samsung has even open sourced this configuration of tools and called K2 Charts.

Sidecar Containers

You can stream logs of different formats together, but this would be harder to analyze and could get messy, considering the scale of Kubernetes. Instead, the preferred way is to attach a sidecar container for each type of log data. A sidecar container is dedicated to collecting logs, and is very lightweight. Every sidecar container contains a Fluentd agent for collecting and transporting logs to a destination.

Archived Log Storage

Storing logs is critical, especially for security. For example, you may find out about a breach in your system that started two years ago, and want to trace its development. In this case, you need archived log data to go back to that point in time, and see the origin of the breach, and to what extent it has impacted your system.

Kubernetes offers basic local storage for logs, but this is not what you’d want to use for a production cluster. You can either use block storage like AWS S3 or Azure Blog, or you can ask your log analysis vendor to give you extended storage on their platform. For archived data, it’s best to leverage cloud storage than on premise servers as they’re more cost efficient and can be easily accessed when needed.

Dedicated Log Analysis Platforms

The ELK stack is a common way to access and manage Kubernetes logs, but it can be quite complex with the number of tools to setup and manage. Ideally, you want your logging tool to get out of the way and let you focus on your log data and your Kubernetes cluster. In this case, it pays to go with a dedicated log analysis platform like LogDNA, which comes with advanced log management features, and is fully managed so you don’t have to worry about availability and scaling your log infrastructure.

You can start collecting Kubernetes logs in LogDNA using just 2 simple kubectl commands:

kubectl create secret generic logdna-agent-key –from-literal=logdna-agent-key=YOUR-INGESTION-KEY-HERE

kubectl create -f https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml

Deeply customized for Kubernetes, LogDNA automatically recognizes all metadata for your Kubernetes cluster including pods, nodes, containers, and namespaces. It lets you analyze your Kubernetes cluster in real-time, and provides powerful natural language search, filters, parsing, shortcuts, and alerts.

LogDNA even mines your data using machine learning algorithms and attempts to predict issues even before they happen. This is the holy grail of log analysis, and it wasn’t possible previously. Thanks to advances in machine learning and the cloud enabling computing at this scale, it’s now a reality.

To summarize, Kubernetes is the leading container orchestration platform available today. Yet, running a production cluster of Kubernetes takes a lot of familiarity with the system and robust tooling. When it comes to log analysis, Kubernetes offers basic log collection for pods, nodes, and clusters, but for a production cluster you want unified logging at the cluster level. The ELK stack comes closest to what a logging solution for Kubernetes should look like. However, it’s a pain to maintain and runs into issues once you hit the limits of your underlying infrastructure.

For unified log analysis for Kubernetes, you need a dedicated log analysis platform like LogDNA. It comes with advanced features like powerful search, filtering, and machine learning to help you get the most out of your log data. Being a fully managed service, you can focus on your Kubernetes cluster and leave the drudge of maintaining log infrastructure to LogDNA. As you run a production Kubernetes cluster, you need a powerful log analysis tool like LogDNA to truly enjoy the promise of Kubernetes – running containers at massive scale.

Learn more about LogDNA for Kubernetes here.

Technical

Choosing The Right Ingestion Client

LogDNA has a range of options by which you can supply your account with log data. If you’re not entirely familiar, these are via the Collector Agent, syslog, code libraries, PaaS integration, and REST API.

Given all these options, which one is right for you? Unfortunately, like many things in tech (as in life), the question is difficult to answer definitively. I don’t mean this in any non-committal way. I only mean that it’s hard to say which option is the right one without some form of well thought through qualifications.

Given that, let’s work through the five options which LogDNA provides, and consider when they would, and when they wouldn’t, be the right option to choose. Let’s start off with, arguably, the simplest of the five, the logging agent.

The Collector Agent

If you want the easiest option, one supported across all of the major operating systems, (Linux, macOS, and Windows), then the Collector Agent is the choice to make. Written in Node.js, it logs information at the OS level. To quote the official documentation, it:

Reads log files from the computer it is installed on and uploads the log data to your LogDNA account…It opens a secure web socket to LogDNA’s ingestion servers and ‘tails’ for new log files added to your specific logging directories. When those files change, the changes are then sent to LogDNA, via the secure web socket.

In addition to reading log files on your server(s), the Collector Agent can also read from the Windows Application Log.

As you may expect from a system daemon or service, after installation, all that’s required is to provide a configuration. On Linux/UNIX systems, this is stored in /etc/logdna.conf and on Windows, it is stored in C:\ProgramData\logdna\logdna.conf.

As an example, let’s assume that my servers are running Ubuntu 16.04 and that I have installed the agent. With that done, the minimum configuration that I’d need to add to /etc/logdna.conf, so that it started sending my log data is:

logdir = /var/log/nginx,/var/log/auth
key = <YOUR LOGDNA INGESTION KEY>

I’ve only needed to provide the directories to read the log files from that are of interest to me, as well as my LogDNA ingestion key. In the example provided, it would send log information from my NGINX server, as well as authentication requests to my LogDNA account.

If there are files that shouldn’t be included, then I can use the exclude option to have the agent skip over them. And there’s another handy option: tags. It allows you to group hosts automatically into dynamic host groups without having to explicitly assign a host to a group within the LogDNA web app.

The agent can be installed using package managers, such as Homebrew, APT, and Yum, as well as from source. Additionally, it also integrates with Kubernetes, if you’re using Docker containers.

Given this level of simplicity, if you’re just getting started with LogDNA, want an option that requires a minimum investment, both initially and over time, and want something that’s available on all the major platforms, then the Collector Agent is the right one to choose.

However, it is considered a bit of a shotgun approach. That is, while it can send your log data straight to LogDNA with little effort, you have only negligible control over how and when it does so. Specifically, you can’t choose a level of log granularity, such as debug, info, warning, and critical.

syslog

The next option is syslog. It is the veteran logging daemon, available on all Linux/UNIX installations, since its development by Eric Allman back in the 1980s. If you’re not familiar, syslog (also available as syslog-ng and rsyslog) is a logging system which allows for logging on a server locally, as well as to remote syslog servers.

It is an excellent choice for logging information across a wide range of devices, such as printers and routers, and a wide range of system-services, such as authentication, servers such as NGINX, CUPS, and Apache, system-level events, and kernel-level events.

To send log data to LogDNA using syslog requires a little bit more effort than the logging agent. Gladly, they’re well documented under Settings -> "View your logs" -> "Via syslog".

Setting up syslog requires adding an entry to /etc/syslog.conf, which can be generated in the “Log Source” modal dialog. For example: *.* @syslog-a.logdna.com:16312. syslog-ng and syslog require a little more configuration effort. But like syslog, the instructions are well documented.

syslog is a particularly appealing option if you’re looking for simplicity and minimal cost. One of the three daemons is likely universally available on any Linux/UNIX system, as they’ve been around for so long. What’s more, it’s a very well understood application, with a wealth of documentation available on how to configure it across a wide variety of use cases.

This gives it an advantage over the Collector Agent. Where the agent passes log data as-is, syslog allows you to choose the format in which the log messages are written. This may not be necessary. But it’s handy nonetheless, as you can write log data that better suits the needs of your applications. Doing so, you’re better able to parse the log data when the time comes, as it will make more sense.

Given this advantage, the fact that syslog is free, and that LogDNA provides detailed configuration instructions and supports host tags/dynamic groups, it’s well worth considering.

You won’t have to look far to find support for it, and you won’t have to invest much time in maintaining it. Another handy aspect of sending log data using syslog is that you don’t need to explicitly integrate against LogDNA, such as when developing applications, to send the log data from your applications. By logging to the system logs, that information is automatically, and transparently sent by syslog.

However, one thing that works against using syslog is a lack of native Windows support. Having said that, there are syslog daemons for Windows, such as Kiwi syslog Server. And you’ll find others if you do further research. But they’re not natively available like syslog is for Linux/UNIX systems.

PaaS Integration

Let’s say that neither the agent nor a syslog daemon suits your purposes, as you’re using a PaaS (Platform as a Service), such as Heroku. In that case, you still need to send your log data to LogDNA, but you may not have as much control as you otherwise might.

In this case, while the options aren’t extensive, you’re not without options. LogDNA supports Heroku, Kubernetes, Docker (including Docker Cloud, Convox, and Rancher), Fluentd, Flynn, CloudWatch, Elastic Beanstalk, and Cloud Foundry.

Assuming that you’re using one in the above list, let’s first consider what you get, which depends on the platform. Some provide more configuration and flexibility; some provide less. In many cases, there isn’t much choice in how logs can be forwarded.

Then there’s the setup complexity and time. While I won’t say that any of the setups are easy, some will require more time and effort than others. However, regardless of what you’re using, LogDNA provides thorough documentation for each one.

Code Libraries

Now to my favorite option, code libraries. As a software developer, you’d naturally expect that it would be. LogDNA has libraries officially developed and supported for four languages — Node.JS, Python, Rails, and Ruby. There are also libraries for Go, iOS, Java, and PHP.

Given that these are some of the most popular languages and that they have runtimes available on the major operating systems, it’s likely that at least one of them will suit your needs.

Now let’s consider the positives of using code libraries. Unlike the previous two options, code libraries afford you complete control over both how and when your log data is sent. You can integrate logging support into your existing application(s).

You can log as much or as little as your application or regulatory environment demands. You can add as much optional extra information as you want. And you can specify, exactly, the information that is sent, such as custom metadata.

Here’s a minimalist example using PHP, which is often my language of choice.

<?php

use Monolog\Logger;
use Zwijn\Monolog\Handler\LogdnaHandler;

require_once ('vendor/autoload.php');

$logger = new Logger('general');
$logdnaHandler = new LogdnaHandler(
    'YOUR LOGDNA INGESTION KEY',
    'myappname',
    Logger::DEBUG
);
$logger->pushHandler($logdnaHandler);

# Sends debug level message "mylog" with some related meta-data
$logger->debug(
    "mylog", [
        'logdna-meta-data-field1' => [
            'value1' => 'value',
            'value2' => 5
            ],
        'logdna-meta-data-field2' => [
            'value1' => 'value'
            ]
    ]
);

This example initializes a logger object, which will send log messages, at or above the level of DEBUG to LogDNA. It will also send two sets of custom metadata — metadata which could be anything that makes sense to your application.

To be fair, this is rather a “Hello World” example. And integrating logging into your code base would take more effort than it took to create this short example — especially if it’s an older and quite complex legacy codebase.

That said, I hope you’ll appreciate the level of power and customizability that using a code library provides. On the flip side, code libraries also incur a high(er) investment, both in their initial investment, as well as in their maintenance over the longer term.

They take special planning, design, testing, and debugging. They also need to be developed using security best practices, both in the code itself, as well as in the deployment process. Given that, while you have much more power, the investment will be higher. Something to keep in mind.

The REST API

And now to the last option. If none of the previous four options suit — or your particular setup is entirely custom — there’s one last option that just may work, the LogDNA REST API.

Similar to using a code library, the REST API allows you to send log data, along with custom metadata, to LogDNA.

curl "https://logs.logdna.com/logs/ingest?hostname=EXAMPLE_HOST&mac=C0:FF:EE:C0:FF:EE&ip=10.0.1.101&now=$(date +%s)" \
-u INSERT_INGESTION_KEY: \
-H "Content-Type: application/json; charset=UTF-8" \
-d \
'{
   "lines": [
     {
       "line":"This is an awesome log statement",
       "app":"myapp",
       "level": "INFO",
       "env": "production",
       "meta": {
         "customfield": {
           "nestedfield": "nestedvalue"
         }
       }
     }
   ]
}'

In the example log request, from the docs, above, you can see that requests to the API are GET requests, and require that a hostname and LogDNA ingestion key are provided. A MAC address, IP address, and a timestamp are able to be included as well.

The request body is in JSON format and can contain one or more message lines. JSON is a light-weight and almost universally accepted format, one able to be created by most every popular software language, along with a variety of tools, today.

As with the code libraries, and as you can see in the example above, the message lines contain the message’s timestamp, body, application name, and log level, as well as optional metadata.

Given that the request is being made using curl, it could be wrapped in a shell script on Linux/UNIX or batch script on Windows. Then, if applicable, the script could be tied into either CRON or the Windows Scheduler. If written well, it could also take input from any number of services, or be called from an existing process via a system call.

However, you don’t need to use curl. You could use any software language that can compose and send a GET request, whether natively or via a third-party package, such as GuzzleHTTP for PHP. Similar to how they work, you could integrate logging directly into an existing application, even if it didn’t have dedicated support via one of the eight code libraries.

The REST API offers quite a significant level of flexibility; a flexibility that the Collector Agent, syslog, and PaaS options simply don’t — or can’t — provide.

Unfortunately though, as with the code library option, there won’t be a clear set of steps required to integrate it. Every application is different and has different needs and requirements. What’s more, it will require a higher investment, both at the start and over time. However, if it’s the one that best suits your needs, then it’s worth considering.

In Conclusion

These are the five ways in which you can send log data from your servers and applications to LogDNA. They all have pros and cons, they’re not all equal, and they can’t be compared in the proverbial “apples to apples” way.

The needs of your particular setup, whether server or application, will supply most of the criteria; as will your budget and the experience of your technical staff. But regardless of the choice that you make, you can get started quite quickly, likely using the collector agency or syslog, and then progressing to the more involved options as time goes by, according to your particular use case, and as your budget allows.

HIPAA

The Role of AWS in HIPAA Compliance

If you’re considering storing your HIPAA log archives in AWS, it’s important you know the details about how Amazon treats HIPAA compliant data.

Healthcare companies are used to having control over physical storage systems, but many are now struggling when it comes to utilizing a cloud environment. There are many misconceptions about ownership, compliance and how the entire log-to-storage process intersects and works.

HIPAA is a set of federal regulations, meaning there is no explicit certification for remaining compliant. Rather, there are guidelines and laws that needs to be followed. Tools like LogDNA and AWS will ensure that compliance is maintained.

A Primer for AWS Customers

All healthcare users of AWS retain ownership over their data and maintain control in regards to what they can do with it. You can move your own data on and off AWS storage anytime you’d like without restriction. End users are in control of how third party applications (like LogDNA) can access AWS data. This access is controlled through AWS Identity and Access Management.

The most popular services for creating backups come from Amazon S3 and Glacier. AWS is responsible for managing the integrity and security of the cloud, while customers are responsible for managing security in the cloud. It’s a minor difference, but an important one at that. This leads us to the core question many healthcare providers ask about AWS.

Is AWS HIPAA compliant?  

There is no way to answer this with a simple yes or no. The question also leads down a faulty path about understanding how these cloud services work. The question should be reframed as:

How does using AWS lead to HIPAA compliance?

The United States’ Health Insurance Portability and Accountability Act (HIPAA) does not issue certifications. A company and its business associates will instead be audited by the Health & Human Services Office. What AWS does is set companies on the path to compliance. Like LogDNA, Amazon signs a Business Associate Agreement (BAA) with the health company. Amazon ensures that they’ll be responsible for maintaining secure hardware servers and provide their secure data services in the cloud.      

How does Amazon do this?

While there may not be a HIPAA certification per say, there are a few certifications and audit systems Amazon holds that establishes their credibility and trust.

ISO 27001

The International Organization for Standardization specifies the smartest practices for implementing comprehensive security controls. In other words, they’ve developed a meticulous and rigorous security program for Information Security Management Systems (ISMS). In summary, the ISO guarantees the following:

  • Systematically evaluate our information security risks, taking into account the impact of company threats and vulnerabilities.
  • Design and implement a comprehensive suite of information security controls and other forms of risk management to address company and architecture security risks.
  • Adopt an overarching management process to ensure that the information security controls meet our information security needs on an ongoing basis.

Amazon’s ISO 27001 certification displays the company’s commitment to security and its willingness to comply with an internationally renown standard. Third party audits continually validate AWS and assure customers that they’re a compliant business partner.

AICPA SOC

The company’s Service Organization Control (SOC) audits through third party examiners, and determines how AWS is demonstrating key compliance controls. The entire audit process is prepared through Attestation Standard Section 801 (AT 801) and completed by Amazon’s independent auditors, Ernst & Young, LLP.

The report reviews how AWS controls internal financial reporting. AT 801 is issued by the American Institute of Certified Public Accountants (AICPA).

Secured ePHI Logging Storage

Healthcare companies that use any AWS service and have a BAA will be given a designated HIPAA account. The following is a comprehensive list sourced from Amazon cataloging HIPAA eligible services. This list was last updated on July 31, 2017. These services cannot be used for ePHI purposes until a formal AWS business associate agreement has been signed.

Amazon API Gateway excluding the use of Amazon API Gateway caching
Amazon Aurora [MySQL-compatible edition only]
Amazon CloudFront [excluding Lambda@Edge]
Amazon Cognito
AWS Database Migration Service
AWS Direct Connect
AWS Directory Services excluding Simple AD and AD Connector
Amazon DynamoDB
Amazon EC2 Container Service (ECS)
Amazon EC2 Systems Manager
Amazon Elastic Block Store (Amazon EBS)
Amazon Elastic Compute Cloud (Amazon EC2)
Elastic Load Balancing
Amazon Elastic MapReduce (Amazon EMR)
Amazon Glacier
Amazon Inspector
Amazon Redshift
Amazon Relational Database Service (Amazon RDS) [MySQL, Oracle, and PostgreSQL engines only]
AWS Shield [Standard and Advanced]
Amazon Simple Notification Service (SNS)
Amazon Simple Queue Service (SQS)
Amazon Simple Storage Service (Amazon S3) [including S3 Transfer Acceleration]
AWS Snowball
Amazon Virtual Private Cloud (VPC)
AWS Web Application Firewall (WAF)
Amazon WorkDocs
Amazon WorkSpaces

Amazon ECS & Gateway in Focus

Amazon EC2 Container Service (ECS) is a major container management service, which supports Docker container logs and can be used to run apps on a managed cluster of EC2 instances. ECS provides simple API calls that you can use to easily deploy and stop Docker-enabled apps.

ECS workloads required to process ePHI do not require any additional configurations. ECS data flow is consistent with HIPAA regulations. All ePHI is encrypted while at rest and in transit when being accessed and moved by containers through ECS.

The process of complete encryption is upheld when logging through CloudTrail or logging container instance logs through CloudWatch into LogDNA.  

Users can also use Amazon API Gateway to transmit and store ePHI. Gateway will automatically use HTTPS encryption endpoints, but as an extra fail-safe, it’s always a good idea to encrypt client-side as well. AWS users are able to integrate additional services into API Gateway that maintain ePHI compliance and are consistent with Amazon’s BAA. LogDNA helps ensure that any PHI sent through Gateway only parses through HIPAA-eligible services.  

Compliance Resources – A Continued Approach  

Amazon is serious about staying compliant in a number of industries. They’re constantly innovating and are continually creating new security services. LogDNA shares this same tenacity for security and continued innovation.

LogDNA Blog Image

Additional Resources:
CloudWatch Logging: https://docs.logdna.com/v1.0/docs/cloudwatch
Legal: https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html
AWS Hub: https://aws.amazon.com/compliance/
Technical DevOps Guide: https://aws.amazon.com/blogs/security/how-to-automate-hipaa-compliance-part-1-use-the-cloud-to-protect-the-cloud/

 

HIPAA

Firewall Logging: Importance for the Healthcare Industry

A large number of healthcare companies are at a loss when it comes to understanding their internal security environment. While the HIPAA Security Rule provides a comprehensive legal framework for ensuring secure technical safeguards, it doesn’t give many specifics on which tools to use.

We’ve already established what proper logging brings to a healthcare environment, as well as its importance. But what about the contents of those logs? Security indicators are one of the most crucial logs a system can receive. The majority of these logs and alerts come from your firewall, and firewalls are the number one security measure a healthcare company needs to have.

Section 164.312(c)(1) states that the integrity of ePHI must be upheld through proper technical procedures and policies to stop this information from being altered or destroyed. This is where Firewall Logging comes in.   

Firewall HIPAA Logs – The Wall of Compliant Protection

Patient data may seem mundane to the multitude of healthcare workers keying and plodding away records daily. But it’s important to realize that this data is coveted by unscrupulous characters lurking around the web. Stolen information can cause irreparable damage to the patients and the establishments responsible for safeguarding that data.

Firewalls are just one component there to stop online intruders. Imagine a towering brick wall denying entrance to attackers in the night. In our case, this metaphoric wall is part of a computer system that denies unauthorized access from the outside and limits outward communication deemed unsafe, i.e. the ability for office computers to access unprotected websites. This system is reactive – what we also need is something proactive.

Firewall logs are the sentries posted up on this proverbial wall – the loggers on the wall. They can respond to real time alerts and backtrack to see what happened. HIPAA compliance requires healthcare companies to have configured log monitoring. Our firewall logs – or rather firewall sentries, serve an important function for maintaining the integrity of ePHI. They do this by:

  • Helping to determine if an attack has taken place
  • Alerting system administrators if an attack is currently happening
  • And logging security data for required audits

Firewall logs watch for intrusions and will relay what action the firewall took to block network attacks on either an individual computer, or an entire in-house data system. A firewall log will relay a few pieces of crucial information: incoming network traffic, a description of suspicious network activity, and the location of activity logged.

Our logging platform gives these logs a foundation so that they can be used, stored and monitored to ensure ePHI safety and HIPAA compliance. We give form to the shapeless firewall data that’s usually left floating around and left inaccessible.

There are a few different types of firewalls. All of them will produce logs, but it’s important to understand the distinction between them in order to build a proper foundation.

Different Bulwarks of Safety

For our purposes here, we’ve divided the number of firewalls into three different types of network firewalls. These include software, web applications, and hardware; all are crucial in maintaining HIPAA safety compliance. Remember that the goal of our firewall system is to stop harmful unauthorized traffic and limit dangerous exterior communication. The goal of our firewall logging is to take actionable steps to stay alert and maintain the integrity of the system and thwart any attacks.

Simply having a firewall won’t cut it. Possessing an interconnected system with multiple protected funnels and monitoring means is more effective.

Software Firewall Safeguard

This is a type of firewall that is often overlooked because it’s usually pre-installed on a number of computers. A healthcare entity needs a firewall between the systems responsible for housing ePHI and all other connected systems. This also includes internal systems.

Software firewalls protect lone computers from a few different types of threats – namely mobile devices that can be compromised. Take for example, a remote employee accessing data from home or on the go. If they’re caught in an unlucky phishing debacle, their firewall will act to protect their personal computer or device and save the integrity of any connected medical data in the process.

Software firewalls are easy to maintain and allow for the remote work to take place. While they might not protect an entire system, they patch up an area liable to attack.

Web Applications Firewall Safeguard

Also commonly known as (WAFs), these should be placed at the frontlines of any application that needs to access the internet, which at this point is the vast majority of them. WAFs help detect, monitor and stop attacks online. A bevy of firewall logs will be sourced from here. Note that a WAF is not an all-purpose firewall; it’s main function is to block suspicious web traffic.

Many databases require access to the internet. Cyber security reports can be generated through logging platforms and then acted upon. The WAF logging combination is akin to the heart rate monitor, but for online security health. If everything is going well, there won’t be any dramatic spikes. But if danger strikes, the necessary alerts and response team will be on it.

There needs to be special care when setting up a WAF, since critical functions could be hampered if it’s not setup properly. But nothing beats this firewall when it comes to protected third party modules and quick logged response time.

Hardware Firewall Safeguard

Hardware firewalls are installed company wide throughout the entire organization’s network. Internal systems are protected from the outside internet. They’re also used to create network segments inside the company that divide access to those with ePHI access from those without it.

Other networks inside the company system may need fewer firewall restrictions placed on them. For example, maybe a medical device designer needs to collaborate with an outside agency of some kind. This particular job function doesn’t require ePHI access; their segmented network shouldn’t be affected, nor should they be on the same network with employees handling ePHI.

A secure network will employ these different types of firewalls together ensuring a protected and HIPAA compliant healthcare company.

kubernetes, Technical

Logging In The Era Of Containers

Log analysis will always be fundamental to any monitoring strategy. It is the closest you can get to the source of what’s happening with your app or infrastructure. As application development has undergone drastic change over the past few years with the rise of DevOps, containerization, and microservices architecture, logs have not become less important, rather they are now at the forefront of running stable, highly available, and performant applications. However, the practice of logging has changed. From being simple it is now complex, from just a few hundred lines of logs we now often see millions of lines of logs, from all being in one place we are now dealing with distributed log data. Yet as logging has become more challenging, a new breed of tools have arrived on the scene to manage and make sense of all this logging activity. In this post, we’ll look at the sea change that logging has undergone, and how innovative solutions have sprung up to address these challenges.

Complexity of the stack

Traditional client-server applications were simple to build, understand, and manage. The frontend was required to run on a few browsers or operating systems. The backend consisted of a single consolidated database, or at most a couple of databases on a single server. When something goes wrong you can jump into your system logs at /VAR/LOG and easily identify the source of the failure and how to fix it.

With today’s cloud-native apps, the application stack has become tremendously complex. Your apps need to run on numerous combinations of mobile devices, browsers, operating systems, edge devices, and enterprise platforms. Cloud computing has made it possible to deliver apps consistently across the world using the internet, but it comes with its own challenges of management. VMs (virtual machines) brought more flexibility and cost efficiencies over hardware servers, but organizations soon outgrew them and needed a faster way to deliver apps. Enter Docker.

Containers bring consistency to the development pipeline by breaking down complex tasks and code into small manageable chunks. This fragmentation lets organizations ship software faster, but it requires you to manage a completely new set of components. Container registries, the container runtime, an orchestration tool or CaaS service – all make a container stack more complex than VMs.

Volume of data has spiked

Each component generates its own set of logs. Monolithic apps are decomposed into microservices with each service being powered by numerous containers. These containers have short life spans of a few hours compared to VMs which typically run for months or even years. Every request is routed across the service mesh and touches many services before being returned as a response to the end user. As a result, the total volume of logs has multiplied. Correlating the logs in one part of the system with those of another part is difficult, and insights are hard won. Having more log data is an opportunity for better monitoring, but only if you’re able to glean insights out of the data efficiently.

Many logging mechanisms

Each layer has its own logging mechanism. For example, Docker has drivers for many log aggregators. Kubernetes, on the other hand, doesn’t support drivers. Instead it uses a Fluentd agent running on every node. More on Fluentd later in this post. Kubernetes doesn’t have native log storage, so you need to configure log storage externally. If you use a CaaS platform like ECS, they would have their own set of log data. ECS has it’s own logs collector. With log collection so fragmented, it can be dizzying to jump from one tool to another to make sense of errors when troubleshooting. Containers require you to unify logging from all the various components for the logs to be useful.

The rise of open source tools

As log data has become more complex the solutions for logging have matured as well. Today, there are many open source tools available. The most popular open source logging tool is the ELK stack. It’s actually a collection of three different open source tools – Elasticsearch, Logstash, and Kibana. Elasticsearch is a distributed full-text search database, Logstash is a log aggregation tool, and Kibana is a visualization tool for time-series data. It’s easy to get started with the ELK stack when you’re dipping your toes into container logging, and it packs a lot of powerful features like high availability, near-real-time analysis, and great visualizations. However, once your logs reach the limits of your physical nodes that power the ELK stack, it becomes challenging to maintain operations smoothly. Performance lags and resource consumption become an issue. Despite this, the ELK stack has sparked many other container logging solutions like LogDNA. These solutions have found innovative ways to deal with the problems that weigh down the ELK stack.

Fluentd is another tool commonly used along with the ELK stack. It is a log collection tool that manages the flow of log data from the source app to any log analysis platform. Its strength is that it has a wide range of plugins and can integrate with a wide variety of sources. However, in a Kubernetes setup, to send logs to Elasticsearch, Fluentd places an agent in every node, and so becomes a drain on system resources.

Machine learning is the future

While open source tools have led the way in making logging solutions available, they require a lot of maintenance overhead when monitoring real-world applications. Considering the complexity of the stack, volume of data, and various logging mechanisms, what’s needed is a modern log analysis platform that can intelligently analyze log data and derive insights. Analyzing log data by manual methods is a thing of the past. Instead machine learning is opening up possibilities to let algorithms do the heavy lifting of crunching log data and extracting meaningful outcomes. Because algorithms can spot minute anomalies that would be invisible to humans, they can identify threats much before a human would, and in doing so can help prevent outages even before they happen. LogDNA is one of the pioneers in this attempt to use machine learning to analyze log data.

In conclusion, it is an exciting time to build and use log analysis solutions. The challenge is great, and the options are plenty. As you choose a logging solution for your organization, remember the differences between legacy applications and modern cloud-native ones, and choose a tool that supports the latter most comprehensively. And as you think about the future of log management, remember that the key words are ‘machine learning’.

HIPAA

Best Security Practices for HIPAA Logging

Despite advanced security measures and increased due diligence from healthcare professionals, system attacks are still a constant threat for a majority of medical organizations. Overlooked security weaknesses, outdated systems, or an inadequate IT infrastructure can be just the catalyst an attacker needs to exploit your protected health information (PHI).

Remaining HIPAA compliant and safeguarding your (PHI) can be accomplished by following a few basic security practices. Professionals need to implement a company-wide security control which establishes how your (PHI) data should be created and stored. You’ll also want to create a compliance plan, or for the more theatrically minded – a contingency plan, in the event of a security breach. Most importantly, a proactive logging strategy has to be integrated each step of the way.

These practices serve as a baseline for security. It’s recommended you build off of this foundation and adjust security measures as needed.

(PHI) Entry – A Foundation For Security

There are a unique set of risks you will contend with daily. Attackers on the outside are always looking for a way in. In 2016 alone, the Identity Theft Resource Center (ITRC) found that over thirty percent of healthcare and medical organizations reported data breaches. Outside threats are always a concern, but take into account the additional threat of inept data handling from employees and improper (or even nonexistent) logging practices and you’re asking for trouble.  

The following steps outline basic security measures, establish a (PHI) entry guideline, and show what should be done before the data even enters your system or logging platform.

  1. Develop or implement a company standard for new patient data entry.
  2. Identify where the (PHI) is being created and who is creating it.
  3. Establish the number of different devices used to enter data from.
  4. Electronic Health Records (EHR) – record how many staff members are entering in data and where are they doing this from.
  5. (re)Configure your database and note what records are stored there.
  6. Create communication standards with your business associates – signees of a mutual Business Associate Agreement (BAA).

A detailed (PHI) flowchart can be made from the preceding information. This allows for a detailed analysis that can show whose hands your information passed through and what systems and technologies were used. A diagram can track data points of entry, revealing weak spots during the data exchange.

For example, a patient’s sensitive information might languish in a filing cabinet or float through an unprotected third party portal online. Your diagram of the (PHI) flow can account for these types of discrepancies in security. A (PHI) flowchart is best used in tandem with a logging compliance report.

Compliance Reports & Safeguard Plans

One of the major failsafes of HIPAA – amended through the HITECH Act, is the requirement in maintaining an audit trail and submitting routine reports if a data breach is suspected. The ability to generate and distribute these reports is important for maintaining and proving compliance.  

A proper log management system will be able to create automated reports that demonstrate compliance. LogDNA has the ability to generate automated audit reports from event logs within your system. Conversely, if an unexpected audit request occurs, you’ll be able to quickly query the necessary results to respond to the auditor and create a report for them manually as well.

Additionally, plans should be made that take into account other areas of the HIPAA Security Rule. This means issuing policies around device access, workstation data safety, employee authentications, mobile use restrictions and encryption.  

Think about utilizing an Incident Response Plan (IRP) –  or creating one if not already in place – while ensuring to amend it and make it useful. An (IRP) is best used to designate a planned response if a security incident arises. HIPAA logging solutions can and should be integrated into this plan.  

This will provide concrete guidelines in the event of a (PHI) data breach. It will also make the team more efficient in the aftermath and allow them to give the proper compliant information to government agencies and individuals affected.

Take Advantage of Your Logging Environment

Logging takes the guesswork out of detecting threats – both internal and external. You’ll be able to commence a quick response and enact the correct procedures to patch any data leaks. It’s crucial to detect an attack before it happens. Sensitive data cannot afford to be lost. HIPAA logging gives the end user the ability to identify events across the whole system (file changes, account access and health data inquiries) while they occur.

These security strategies will help you get the most out of your HIPAA logging platform:

  • Determine what type of logs will be generated and stored(while keeping Compliance in mind).
  • Ensure a secured storage place for logs that can be saved up to six years. This can be accomplished through storage in an encrypted archive by using AWS, Azure, or other  certified and protected service.  
  • Designate an employee who will check logs on a daily basis.
  • Create a plan for reviewing suspect alerts.
  • Enact fail safes so that stored logs cannot be tampered with internally.
  • Adjust log collection accordingly.

Event logs are bits of information coming from a myriad of sources. Firewalls, printers, (EHR) systems and more all contribute to the data that the logging platform will receive. A majority of organizations have a mixed IT environment; it’s essential to have the ability to collect and support a wide range of user activity and log file types.

Log analysis not only ensures you comply with HIPAA, but also gives you the tools you need to defend against attacks and faulty data practices.

Think of LogDNA as the sentry lookout that warns you of incoming danger.

We’re using our digital eyes to spot all incoming risks and provide the raw data to create audit records and maintain HIPAA compliance.

While it’s important to focus on security indicators, logging can also monitor a number of other events inside the system. Event logs can point towards malfunctioning applications, outdated hardware or faulty software. All events are monitored and can be traced back to where they originated from.  

An internal structure that places an importance on HIPAA security will be able to utilize logging to stay compliant and keep crucial healthcare information safe.

Have questions?  Reach out to us at sales@logdna.com.

HIPAA

What is HIPAA Compliant Log Management?

The medical establishment stretches far and wide; it is a behemoth creator of data. Data that must be protected and secured at all times away from prying eyes. Hospitals, medical networks, pharmaceutical establishments, electronic billing systems, medical records – all of these medical industries and more run on communally shared data. Due to the critical nature of this data and its need to be accessed by a multitude of professionals, certain laws have been put into place so that this information can be exchanged freely and securely.   

The Health Insurance Portability and Accountability Act of 1996 Title II (HIPAA) is the most important law of the land that addresses these concerns. Regulations have been created to protect electronic health information and patient information. Log management and auditing requirements are covered extensively by HIPAA as well.

Records of all kinds are produced and logged daily. To secure this protected information, it’s important to know who has access to your internal systems and data. Syslog files are the most commonly logged files across your network of servers, devices and workstations. Some of this information includes: patient records, employee data, billing, and private account data – information that can’t afford to be lost or stolen.   

It’s grown increasingly more important for healthcare professionals and business partners alike to maintain HIPAA compliance indefinitely. Log files (where healthcare data exists) must be collected, protected, stored and ready to be audited at all times. A data breach can end up costing a company millions of dollars.

Not complying with HIPAA regulations can be costly.

Understanding HIPAA and the HITECH Act: Log Compliance

Before we look into how log management and HIPAA compliance interact, an overview of the laws is needed. This will provide you with the knowledge to understand relevant compliance regulations and how they might affect your logging strategy.

HIPAA

This act has created a national standard in upholding privacy laws inherent to all protected health information. These standards have been put in place to enhance the United States’ health care system’s use and efficiency of electronic data exchange.    

Organizations that handle protected information must have a dedicated IT infrastructure and strategies to ensure data privacy to stay HIPAA compliant. This is where a log management system comes in handy. Compliant organizations must be prepared to deal with a number of different circumstances. These include:

  • Investigation of a Suspected Security Breach
  • Maintaining an Audit Trail
  • Tracking A Breach (What Caused it & When Did it Occur)

A HIPAA audit needs archived log data, specific reports and routine check-ups completed regularly. HIPAA requires a compliant log management system that can hold up to six years retention of log data. This is the minimum amount of time that records need to be held – LogDNA complies with HIPAA by giving users the option to store and control their own data. We allow users the ability configure a nightly archiving of their LogDNA logging data and send it to an external source. This would include an S3 bucket, Azure Blog Storage, Openstack Swift or other storage method. Users can then of course store this data for a minimum of six years.

Compliant log management allows for all of these regulations to be met. LogDNA augments an IT infrastructure, ensures data privacy and can comply with regular automated audit requests.

HITECH Act

This act was an amendment to HIPAA in 2010, which required an additional audit trail be created for each medical transaction (logged health information).

The audit regulations highlighted above reflect the need to keep an around-the-clock logging solution that protects the integrity of all medical health records. These stipulations in HIPAA point towards a levied importance on maintaining compliant log records.

Specific HIPAA Logging Regulations: Cybersecurity Safeguards

The following HIPAA sections were created to set a standard for logging and auditing. If a logging system doesn’t meet these requirements, they are noncompliant.

The following stipulations aren’t all that complicated – though they may appear it. We’ll use LogDNA as a relational example. Essentially each section below shows how LogDNA’s built-in features meet compliance according to each individual law. (The bullet points corresponds to the listed section.)

Beware, legalities ahead.

Logging

Section 164.308(a)(5)(ii)(C): Log-in monitoring (Addressable) – “Procedures necessary for monitoring log-in attempts and reporting discrepancies.”

  • LogDNA’s basic functionality logs “login attempts” and reports discrepancies

Section 164.308(b)(1): Business Associate Contracts And Other Arrangements – “A covered entity, in accordance with § 164.306 [the Security Standards: General Rules], may permit a business associate to create, receive, maintain, or transmit electronic protected health information on the covered entity’s behalf only if the covered entity obtains satisfactory assurances, in accordance with § 164.314(a) [the Organizational Requirements] that the business associate will appropriately safeguard the information (Emphasis added).”

  • LogDNA will happily sign a Business Associate Agreement (BAA) ✔

Section 164.312(a)(1):Access Control – “Implement technical policies and procedures for electronic information systems that maintain electronic protected health information to allow access only to those persons or software programs that have been granted access rights as specified in § 164.308(a)(4)[Information Access Management].”

  • LogDNA has a secure system that will only allow select users access to protected data

Auditing

Section 164.312(b): Audit Controls – “Implement hardware, software, and/or procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information.”

  • LogDNA records activity from all information systems within a protected environment

Section 164.312(c)(1): Integrity“Implement policies and procedures to protect electronic protected health information from improper alteration or destruction.”

  • LogDNA gives the user the opportunity to archive their own data outside of our system, which is then under their own control and management. ✔

LogDNA – A Commitment to Compliance

LogDNA’s platform helps healthcare companies meet their own HIPAA compliance requirements in a number of ways. We’re audited for HIPAA and HITECH compliance ourselves on an annual basis by a qualified security assessor.

Here are just some of the few events we can log.

  • Protected information being changed/exchanged
  • Who accessed what information when
  • Employee logins
  • Software and security updates
  • User and system activity
  • Irregular Usage patterns

Logs are best used when they’re being reviewed regularly. A system that monitors your log data can see if a specific user has been looking at a patient’s file too much, or if someone has logged into the system at a strange hour. Often times a breach can be spotted by looking over the data. For example, a hacker may be trying thousands of different password combinations to break in.

This will show up in the log and can then be dealt with.

Tracked and managed logs are able to comply with audit requests and help your health organization get a better grasp of the data streaming in and protect it.  It’s never too late to have an intelligent logging solution. You’ll be able to have a better grasp over your system, protect your crucial information and always stay compliant.

To ensure you’re HIPAA compliant, either:

  1. Visit the LogDNA HIPAA page to sign up for an account, or
  2. Get your specific HIPAA questions answered at sales@logdna.com