Detecting operational data abnormalities using OpenSearch

Machine Learning


OpenSearch offers a robust and affordable anomaly detection solution. This fully open source, extensible platform incorporates machine learning capabilities directly into the operational data pipeline.

The scale, complexity and speed of current IT infrastructure poses several challenges when using traditional DevOps techniques. By incorporating AI and machine learning into operations, AI-powered DevOps (AIOps) enables proactive monitoring, intelligent alerts and autonomous remediation. An important component of this ecosystem is the OpenSearch anomaly detection tool. This tool provides an open source, machine learning-based method for real-time detection of operational data anomalies.

The role of AIOps

In the digital age, IT systems generate vast amounts of data per second, ranging from events and logs to metrics and traces. Maintaining such systems manually is unsustainable and inefficient. IT Operational AI (AIOps) uses AI and machine learning to automate and improve IT operations, facilitating rapid identification, diagnosis and resolution of problems.

Traditional IT monitoring tools were reactive, static, and rules-based. As organizations deploy microservices, containers, hybrid clouds and edge computing, IT environments have become complex, volatile and large. To manage these environments, AIOps provides intelligence, automation, and scalability.

OpenSearch Overview

OpenSearch is an open source platform for search, analysis and observability, originally based on Elasticsearch. Maintained by an Amazon-led OpenSearch project, it is built to manage application performance monitoring (APM), full-text search, and log analysis. The AIOps feature is supported by OpenSearch's Machine Learning (ML) plug-in and observability features, particularly for log-based anomaly identification, metric analysis and alerts.

Table 1: OpenSearch Core AIOPs Features

Features explanation Plugins
Monitoring Log/metrics visualization, dashboard OpenSearch Dashboard
Predictive insights Trend detection with time series machine learning models ML Commons
Anomaly detection Real-time machine learning model for anomaly detection ML Commons
log
intake
Log analysis from various sources Data Prepper
Warning Provides triggers for a variety of anomalies.
Related events
Alert Plugins

The main advantages of this tool are:

  • It is open source and is free to use.
  • It provides easy integration with ELK Stack (Elasticsearch, Logstash, Kibana, Beats) tools and is fully compatible with Logstash and Beats.
  • It incorporates an ML engine with the Random Cut Forest (RCF) algorithm.
  • Enhanced multivariate anomaly detection feature allows you to automatically detect data drifts and abnormal patterns.

OpenSearch for anomaly detection

OpenSearch's integrated AI-powered anomaly detection enables large-scale system monitoring to be done automatically and in real time. Figure 1 shows the core components of the tool that can be used for anomaly detection.

OpenSearch's core components for anomaly detection
Figure 1: OpenSearch core components for anomaly detection

Plugins: OpenSearch's core plug-ins are used to detect anomalies in time series data using unsupervised machine learning. It uses a lightweight random cut forest algorithm to identify data points that are far from normal data points. This plugin has a multi-entity detection utility that detects anomalies for each host, service, etc. in real time. Efficient visualization of anomalies using UI-based management with REST APIs and dashboards.

The index management plugin manages the data life cycle of the index used for anomaly detection. This plugin defines policies such as index rollover, retention, and deletion. Prevents storage bloat caused by continuous abnormal logs.

The Performance Analyzer plugin is extremely useful for root cause analysis of abnormalities. Analyze system-level performance metrics and associate performance degradation with detected anomalies.

Dashboard: An online user interface tool called the OpenSearch Dashboard allows you to explore, view, and evaluate data stored in your OpenSearch cluster. This is the OpenSearch project fork for Kibana 7.10.2. It contributes greatly to security, monitoring, anomaly detection and DevOps observability through user-friendly dashboards, charts, graphs and plugins. The Anomaly Detection Dashboard provides a complete interface for controlling the detector and monitoring abnormal activity within the data stream (Table 2).

Table 2: Various components of the Anomaly Detection Dashboard

component explanation
Detector list This list consists of “Display Status” and “Type”.
Detector Detail View This view consists of:

  • Real-time abnormality graph
  • Past abnormal heat map
  • Characteristics Score
  • Abnormal grade and reliability level
  • Raw abnormal results data
Settings Panel These include setting the functionality to monitor, defining aggregations, selecting detection intervals, and setting delays and filters for multivariate detection.
Linked Visualizations This helps link anomaly results to an external dashboard.

Alert Framework: The OpenSearch alert plugin is a powerful tool used to monitor data and trigger an action when certain conditions are met. This allows you to receive real-time alerts about metrics, logs, and anomaly detection results. You can configure monitors, triggers, and actions to alert you when a threshold is exceeded or a pattern is found, and to automate corrective actions.

The core concepts of the OpenSearch Alerts framework are:

  • monitor: Specifies which data to monitor and how often. Run a query or monitor an abnormality detector.
  • trigger: Conditions in the monitor that specify what to look for (for example, a value exceeds a threshold).
  • action: Shows how to respond when a trigger is triggered (send webhooks, Slack messages, emails, etc.).
  • destination: A reusable notification channel.

Table 3 compares OpenSearch with the well-known ELK Stack and Prometheus+Grafana tools.

Table 3: Comparison of OpenSearch and other popular tools

Parameters Open Search Elk Stack Prometheus + Grafana
Tool type Open Source Commercial + OSS Open Source
AIOps Focus Medium Middle to high low
Anomaly detection Embedded via plugin (RCF) Machine learning module.
Subscription-based
Supports only basic alerts
Multiple Entities Discovery Supported Subscription-based Not supported
Warning Built-in functions that can be plugged in. It has subscription-based Watcher-style functionality. Contains basic rules for alerts only
Fee free Machine learning requires a subscription free
Custom ML Models Via ML Commons plugin Subscription-based Not supported

Real-time anomaly detection is essential for maintaining system stability, reducing downtime and proactively solving performance issues in the rapidly changing field of AI-powered DevOps (AIOps). OpenSearch incorporates support for the Random Cut Forest (RCF) algorithm, allowing unsupervised anomaly identification of time series data without the need for complex external setups. Whether you're a small or large DevOps configuration, you can benefit greatly from a smooth interface with security plugins, visual dashboards and alerting systems. Compared to other tools like ELK stack and Prometheus+Grafana, OpenSearch stands out for its openness, flexibility, built-in anomaly detection and support for multi-entity models. All of these are essential to your AIOps strategy.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *