ProTip, Technical

Best Practices for Log Alerts

Application logs are more than just tools for root cause analysis. They’re also a way to gain insight into critical events such as a loss in sales, server warnings, HTTP errors, performance, and numerous other activities that impact business productivity. Logs are just thousands of lines of raw data, but they can be parsed and leveraged to provide a better understanding of what goes on under the hood of your application. It’s common for developers to set up logging for code exceptions within the application, but logging provides far more benefits than just bug tracking and should be used to alert administrators to issues that need their attention.

Application and Server Performance

Any administrator who has dealt with performance issues will tell you that it’s one of the most difficult problems to analyze and pinpoint a root cause for repair. Performance degradation can occur at certain times of the day, during an active DDoS attack, or what seems like no reason at all. QA can do performance testing on your application, but these tests rarely represent a production environment that supports thousands of users concurrently. For most organizations, performance issues occur during business growth and can harm its potential expansion. Performance is also problematic because it’s unforeseen and rarely a quick fix for developers and administrators.

Using log alerts, you can assess what’s happening when an application’s performance is diminished. It could be from a specific event such as a poorly optimized database query or when CPU usage spikes. Log these types of events, and you can not only identify when application performance will wane but also when server resources could be exhausted. Instead of suffering from a server crash, logging these events will give you insights for when it could be time to upgrade server hardware. They also help you pinpoint components of your application that could be causing performance degradation.

Your developers would need to set a baseline, but — for instance — you could set an alert for any event that takes longer than 100 milliseconds to process. When you see a pattern, you can then have developers research more into these application components for better optimization.

When CPU usage spikes to over 80%, set an alert to inform administrators. It could be something as simple as upgrading RAM or even your server’s CPU, but having alerts on these logs will give you the ability to analyze the time of day and any patterns surrounding application procedures.

Failed Sales Events

Most applications log exceptions so that developers can look back into errors and provide updates. But not every exception is created equally. The most critical events are those that impact business revenue. These events are the ones you should monitor closely and send alerts to the right response team.

It shouldn’t be your customers calling to tell you that there are bugs in your application. You want to find them before they do, and many customers will just bounce to another vendor if you have too many bugs. Many of your customers won’t report application issues at all, so you could be losing sales every hour and never know it.

When you build a shopping cart, you should have a point where you ask a visitor for a way to contact them. Usually, email is a common input field during account creation. With an alert on failed shopping cart activity, you have a way to contact a customer should they receive an error and bail on the shopping experience. Alerts are a great tool to salvage a lost customer due to application bugs.

But you also need alerts to tell you when components of your application are creating obstacles for customer shopping experiences. It could be an alert for performance (similar to the previous example), or your customers could be dropping at a specific point in the sales funnel. Alerts give you insights into the efficacy of your marketing and user experience. Use them generously to identify issues with your sales pages and find solutions for a better user experience.

Security and Suspicious Activity

Developers are good at logging exceptions, but they don’t usually account for security events. Security events can be any suspicious behaviors such as automated logins from account takeover tools (ATOs) using customer data, repeated patterns of failed admin login attempts, ACL changes or new accounts created.

Usually, some security events are triggered from the database, but this limits your logging to database-specific activity. With logging and alerts, you should use them to make the right people aware of suspicious activity that happens on any one of your servers outside of database activity.

With ATOs, an attacker will use automated software to log into a customer’s account and use purchased credit card data to buy product to test if the card is viable. Logs should be used to detect this type of activity and alert administrators to suspicious events.

Any modifications to security permissions or authorization should also be logged and alerts sent. This could be elevation of permissions for any specific user, new routing rules configured on your infrastructure, or user access to critical files. Security events are a primary method organizations use to identify attacks before they become catastrophic data breaches.

How Do You Set Up Alerts?

You need the right data before you can set up alerts. The data that you log is up to you, but some standard data points are needed to create efficient logs that scale across your infrastructure. Some common data points include:

  • Date and time
  • Host name
  • Application name
  • Customer or account that experienced the error
  • IP address or other geographic markers for the visitor
  • Raw exception information
  • Line number where it occurred (if applicable)
  • Type of error (fatal, warning, etc)

You could write a logging application, or you could save time and configuration hassles by using LogDNA. With built-in features that provide logging data and sending alerts, you can save months of development and testing for your own solution.

Instead of only using logs for basic events, an organization’s best practices should include activity that gives administrators insight into patching issues before they become catastrophic instead of just using them to retroactively find solutions. LogDNA can provide you with the right tools and alerts that organizations can leverage to avoid revenue-impacting bugs and server errors.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s