Adding automated anomaly detection to Grafana

published on 14 June 2024

Anomaly detection is the process of identifying unusual patterns or outliers in data. Grafana offers automated anomaly detection using machine learning algorithms and statistical models to monitor various data sources like application metrics, network traffic, and system logs. This helps quickly identify and respond to potential issues, reducing downtime and data breaches.

Key Steps:

  1. Enable Machine Learning in your Grafana Cloud instance by going to Alerts & IRM > Machine Learning and clicking Initialize.

  2. Install the OpenSearch Anomaly Detection Plugin from the Plugins section.

  3. Configure the Plugin to work with your data sources by selecting OpenSearch as the anomaly detection method and adjusting settings like sensitivity.

  4. Set Up Anomaly Detectors by creating a new detector with required details like name and model features.

  5. Configure Alerts by creating a new alert rule based on the anomaly detector and defining conditions and notification channels.

  6. Visualize Anomalies by adding an anomaly detection panel to your dashboard and customizing visuals like display settings and sensitivity.

  7. Choose an Algorithm: Grafana offers DBSCAN (for closely moving data series) and MAD (for stable data series).

Algorithm Strengths Weaknesses Use Cases
DBSCAN Good for closely moving data series May struggle with noisy data Kubernetes pods, similar entities
MAD Effective for stable data series May miss closely packed outliers System KPIs, web applications

Common issues and troubleshooting steps:

Issue Steps
Plugin Installation 1. Check version compatibility
2. Follow installation instructions
3. Check server logs
4. Reinstall or seek community help
Configuration Errors 1. Review settings against algorithm requirements
2. Test configuration in smaller parts
3. Consult documentation or seek community help

For advanced setups, consider creating advanced alert rules based on multiple detectors or aggregated outliers, and integrating with other tools like Prometheus, Alertmanager, and PagerDuty.

Getting Ready

Software Needed

To get started with automated anomaly detection in Grafana, you'll need:

Software Version/Details
Grafana Version 7.0 or later
Grafana Cloud Optional, but recommended for machine learning features
OpenSearch plugin Required for anomaly detection
Machine learning components Such as LoudML

User Permissions

Before enabling machine learning features, ensure you have the right user permissions. As an admin, you can assign specific permissions based on which APIs users need access to. The Security plugin has two built-in roles for most anomaly detection use cases:

  • anomaly_full_access
  • anomaly_read_access

Data Sources

Grafana supports various data sources for anomaly detection, including:

These data sources provide the data for machine learning algorithms to identify patterns and anomalies. Make sure you have the necessary data sources configured and connected to your Grafana instance.

Turning On Machine Learning

To use advanced features like anomaly detection, forecasting, and predictive analytics in Grafana, you'll need to enable machine learning. Here's how to turn on machine learning in your Grafana Cloud instance:

Finding the Machine Learning Section

  1. In the left menu of your Grafana Cloud stack, click Alerts & IRM.
  2. Select Machine Learning to open the machine learning dashboard.

Activating Machine Learning

  1. Click the Initialize button to enable machine learning for your Grafana Cloud instance.

Once initialized, machine learning will be accessible to users in your organization.

Note: You must have administrative privileges to enable and configure machine learning in your Grafana Cloud instance.

Adding Anomaly Detection Plugins

To enhance Grafana's capabilities and gain deeper insights into your data, you can integrate anomaly detection plugins into your dashboard. This section covers how to install and configure the OpenSearch anomaly detection plugin.

Installing the OpenSearch Plugin

OpenSearch

Follow these steps to install the OpenSearch plugin:

  1. In your Grafana instance, go to the Plugins section.
  2. Click Browse, and search for the "OpenSearch anomaly detection" plugin.
  3. Open the plugin's details page and click Install.
  4. Once installed, click Enable to activate the plugin.

Configuring the Plugin

After installing and enabling the plugin, you'll need to configure it to work with your data sources and Grafana setup:

  1. Go to the Data Sources section in Grafana.
  2. Select the data source you want to use with the OpenSearch plugin.
  3. In the data source settings, choose OpenSearch as the anomaly detection method.
  4. Adjust the plugin settings as needed, such as the sensitivity slider to control the normality band thickness.

With the OpenSearch anomaly detection plugin installed and configured, you can now detect anomalies in your data and visualize them on your Grafana dashboard.

sbb-itb-9890dba

Setting Up Anomaly Detectors

Anomaly detectors in Grafana help identify unusual patterns in your data, allowing you to take proactive steps to address potential issues. Here's how to set them up:

Creating a Detector

1. Open the dashboard where you want to add the detector.

2. Click "Add panel" and choose the visualization type for anomaly detection.

3. In the visualization panel, click the ellipsis icon () and select "Anomaly Detection" > "Add anomaly detector".

4. Choose "Create new detector" and enter the required details, such as detector name and model features.

5. Preview the visualization by toggling the "Show visualization" button.

6. Click "Create detector" to add it to your visualization.

Configuring Alerts

To receive notifications when anomalies are detected:

1. Go to the "Alerting" section in Grafana.

2. Create a new alert rule and select the anomaly detector as the data source.

3. Define the alert conditions, such as the anomaly detection threshold and notification channels.

4. Save the alert rule to start receiving alerts for detected anomalies.

Step Description
1 Open the dashboard
2 Add a visualization panel
3 Access the anomaly detection options
4 Create a new detector with required details
5 Preview the visualization
6 Add the detector to the visualization
7 Go to the Alerting section
8 Create a new alert rule
9 Set the alert conditions
10 Save the alert rule

Visualizing Anomalies

Visualizing anomalies is crucial for monitoring and understanding anomaly detection results in Grafana dashboards.

Adding Anomaly Panels

To add an anomaly detection panel to a Grafana dashboard:

  1. Open the dashboard.
  2. Click "Add panel" and choose the visualization type for anomaly detection.
  3. In the panel, click the ellipsis icon () and select "Anomaly Detection" > "Add anomaly detector".
  4. Choose "Create new detector" and enter the details, such as detector name and model features.
  5. Preview the visualization by toggling the "Show visualization" button.
  6. Click "Create detector" to add it to your visualization.

Customizing Visuals

Once you've added the anomaly detection panel, you can customize the visuals:

Option Description
Display Settings Filter out series without anomalies.
Historical Sample Size Define the history window for chart forecasting.
Anomaly Type Filter anomalies higher, lower, or both.
Sensitivity Adjust the anomaly detection sensitivity to high, medium, or low.

Anomaly Detection Algorithms

Grafana offers two algorithms to identify unusual patterns or outliers in time series data: DBSCAN and MAD.

DBSCAN Algorithm

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clusters data points based on their density and distances. It works well for data series that move closely together over time or have strong trends. DBSCAN flags data points outside the largest cluster as anomalies.

MAD Algorithm

The Median Absolute Deviation (MAD) algorithm compares each data point's distance to a rolling 24-hour median. It flags data points outside the chosen sensitivity threshold as anomalies. MAD is less affected by out-of-sync events, like instances restarting at different times.

Choosing an Algorithm

Algorithm Strengths Weaknesses Use Cases
DBSCAN Suitable for closely moving data series May struggle with noisy data Kubernetes pods, similar entities
MAD Effective for stable data series May miss closely packed outliers System KPIs, web applications

When selecting an anomaly detection algorithm, consider your data's characteristics and the type of anomalies you want to detect. Understanding each algorithm's strengths and weaknesses will help you choose the most suitable one for your use case.

Troubleshooting

Plugin Installation Issues

Sometimes, installing the OpenSearch plugin may not go smoothly due to compatibility or dependency issues. To fix this:

  1. Ensure you have the correct Grafana and OpenSearch versions installed. Check the plugin's documentation for specific requirements.
  2. Follow the installation instructions carefully.
  3. If you encounter errors, check the Grafana server logs for more details.
  4. You can try reinstalling the plugin or seek help from the Grafana community forums.

Configuration Errors

When setting up anomaly detectors and alerts, you may face errors related to data sources, queries, or algorithm settings. To troubleshoot:

  1. Review your configuration settings and ensure they match the anomaly detection algorithm's requirements.

For example, when using the ANOMALY_DETECTION_BAND() function with a CloudWatch data source, you must enable anomaly detection in the AWS console for that time series. Also, check that your query is correctly configured with the right metric and aggregation settings.

  1. Break down your configuration into smaller parts and test each individually to identify the error source.
  2. Refer to the plugin's documentation or seek help from the Grafana community forums if needed.
Issue Troubleshooting Steps
Plugin Installation 1. Check version compatibility
2. Follow installation instructions
3. Check server logs
4. Reinstall or seek community help
Configuration Errors 1. Review settings against algorithm requirements
2. Test configuration in smaller parts
3. Consult documentation or seek community help

Advanced Setup

Advanced Alerting

You can set up more advanced alert rules based on anomaly detection results. This includes:

  • Combining multiple anomaly detectors: Create an alert rule that triggers when multiple detectors identify anomalies.
  • Aggregated outlier-based alerts: Set a threshold for the percentage of misbehaving entities (e.g., pods) that will trigger an alert.
  • Customized notifications: Send alerts to specific teams, individuals, or integrate with incident management tools like PagerDuty or OpsGenie.

For example, you can create an alert rule that notifies you when at least 20% of pods in a cluster are misbehaving. This uses an aggregated outlier-based alert rule, where you set a threshold for the percentage of misbehaving pods.

Integrating Other Tools

Integrating Grafana with other monitoring and incident management tools provides a more comprehensive observability solution. Here are some integration options:

Tool Purpose
Prometheus Collect metrics from your application
Alertmanager Manage and route alerts to specific teams
PagerDuty Integrate with incident management for a complete solution

For example, you can use Prometheus to collect metrics from your application, and then use Grafana to visualize and analyze those metrics. You can also use Alertmanager to manage and route alerts to specific teams or individuals.

Summary

This guide covered the steps to add automated anomaly detection to Grafana. Here's a quick recap:

Enable Machine Learning

First, you need to enable machine learning in your Grafana Cloud instance. Go to Alerts & IRM > Machine Learning and click Initialize.

Install Anomaly Detection Plugin

Next, install the OpenSearch anomaly detection plugin:

  1. Go to the Plugins section in Grafana.
  2. Search for "OpenSearch anomaly detection" and install it.
  3. Enable the plugin after installation.

Configure the Plugin

Configure the OpenSearch plugin to work with your data sources:

  1. Go to Data Sources and select your data source.
  2. Choose OpenSearch as the anomaly detection method.
  3. Adjust settings like sensitivity as needed.

Set Up Anomaly Detectors

To set up anomaly detectors:

  1. Open the dashboard and add a visualization panel.
  2. Access anomaly detection options and create a new detector.
  3. Enter details like detector name and model features.
  4. Preview and add the detector to the visualization.

Configure Alerts

To receive alerts for detected anomalies:

  1. Go to the Alerting section.
  2. Create a new alert rule using the anomaly detector.
  3. Define alert conditions and notification channels.
  4. Save the alert rule.

Visualize Anomalies

To visualize anomalies:

  1. Add an anomaly detection panel to your dashboard.
  2. Customize visuals like display settings and sensitivity.

Choose an Algorithm

Grafana offers two algorithms for anomaly detection:

Algorithm Strengths Weaknesses Use Cases
DBSCAN Good for closely moving data series May struggle with noisy data Kubernetes pods, similar entities
MAD Effective for stable data series May miss closely packed outliers System KPIs, web applications

Choose the algorithm based on your data characteristics and anomaly types.

Troubleshoot Issues

Common issues and troubleshooting steps:

Issue Steps
Plugin Installation 1. Check version compatibility
2. Follow installation instructions
3. Check server logs
4. Reinstall or seek community help
Configuration Errors 1. Review settings against algorithm requirements
2. Test configuration in smaller parts
3. Consult documentation or seek community help

Advanced Setup

For more advanced setups, consider:

  • Creating advanced alert rules based on multiple detectors or aggregated outliers.
  • Integrating with other tools like Prometheus, Alertmanager, and PagerDuty.

Related posts

Read more