SIBYL: A Machine Learning-Based Framework for Predicting Dynamic Workloads

Machine Learning


This paper is ACM SIGMOD/Conference on Principles for Database Systems (Opens in a new tab) (SIGMOD/PODS 2024) is the leading forum for large-scale data management and databases.

Accepted papers will have the SIGMOD/PODS 2024 logo on the left side of the first page.

In today's fast-changing digital environment, data analysts increasingly rely on analytics dashboards to monitor customer engagement and app performance. But as data volumes grow, these dashboards can slow down, leading to delays and inefficiencies. One solution is to use software designed to optimize how data is physically stored and retrieved, but the challenge remains of predicting the specific queries analysts will run, which is complicated by the dynamic nature of modern workloads.

In our paper “SIBYL: Predicting Time-Varying Query Workloads,” presented at SIGMOD/PODS 2024, we present a machine learning-based framework designed to accurately predict queries in dynamic environments. This innovation enables traditional optimizers, typically aimed at static settings, to seamlessly adapt to changing workloads, ensuring consistently high performance even as query demands evolve.

Microsoft Research Podcast

IDEA: Language Technology for All by Kalika Bali

Our new series “Ideas” debuts with guest Karika Bari, where the spoken language technology researcher discusses sci-fi and its influence on her career, the design thinking philosophy behind her work, and the “crazy idea” she came up with to work with low-resource languages.


SIBYL Design and Features

The SIBYL framework is based on studies of real-world workloads, which have shown that most workloads are dynamic but follow predictable patterns. We have identified the following recurring patterns in how parameters change over time:

  • trend: Queries that increase, decrease, or remain stable over time.
  • Regularly: Queries that occur periodically, such as hourly or daily.
  • combination: A combination of trends and cyclical patterns.
  • random: A query with an unpredictable pattern.

These insights, shown in Figure 1, form the basis of SIBYL’s query workload prediction capabilities, ensuring that your database remains at peak efficiency even as usage patterns change.

Diagram of how parameters change with query arrival time and identify four common patterns. The y-axis shows the query arrival time and the x-axis shows the parameter value. Section (a) shows trend patterns, including increasing and decreasing trends. Section (b) shows periodic patterns, characterized by regular patterns at fixed intervals such as hourly, daily, or weekly. Section (c) combines trend and periodic patterns, while section (d) represents a random pattern, indicating the absence of regular or predictable patterns.
Figure 1. We investigated patterns of change and predictability in database queries by analyzing two weeks of anonymized data from Microsoft telemetry systems that guide decisions for Microsoft products and services.

SIBYL uses machine learning to analyze historical data and parameters to predict queries and arrival times. The architecture of SIBYL, shown in Figure 2, works in three phases:

  • training: Build machine learning models using historical query logs and arrival times.
  • prediction: Uses a pre-trained model to predict future queries and their timing.
  • Step-by-step fine-tuningIt continuously adapts to new workload patterns through an efficient feedback loop.
The diagram shows the three phases of SIBYL. The first phase is the training phase, where we characterize past queries and their arrival times and train an ML model from scratch. The second phase is the prediction phase, where we continuously receive the latest queries from the workload trace and use the ML model pre-trained in the training phase to predict queries and their expected arrival times in the next time interval. The last phase is incremental fine-tuning, where we monitor the accuracy of the model and detect workload changes (such as new types of queries appearing in the workload) through a feedback loop. We efficiently tune the model by incrementally fine-tuning with the changed workload without retraining from scratch.
Figure 2. SIBYL architecture overview.

Challenges and innovations in designing predictive frameworks

Designing an effective prediction framework is challenging, especially in managing the changing number of queries and the complexity of creating separate models for each type of query. SIBYL addresses these issues by supporting scalability and efficiency by grouping high-volume queries and clustering low-volume queries. As shown in Figure 3, SIBYL consistently outperforms other prediction models and maintains accuracy over different time intervals, proving its effectiveness in dynamic workloads.

This figure shows a comprehensive comparison of four forecasting models across three different workloads (Telemetry, SCOPE, and BusTracker) and the Sales dataset. The models compared are history-based, random forest, vanilla LSTM, and Sibyl-LSTM. The models are evaluated based on three metrics: recall, precision, and F-1 score. Each metric is represented in a separate column, and the workloads are organized in rows. The evaluation is done at different forecast intervals: 1 hour, 6 hours, 12 hours, and 1 day. Sibyl-LSTM outperforms other forecasting models and maintains stable accuracy across different time interval settings. Vanilla LSTM and random forecast perform poorly in the Sales workload, which has many outliers and unstable patterns. For the telemetry workload, the history-based method performs well in the 12-hour interval because the workload's repeated queries have the same parameter values ​​within a day (between the past 12-hour window and the future 12-hour window). However, this approach is ineffective for daily intervals, as many query parameter values ​​change across daily boundaries. History-based approaches do not produce satisfactory results for the other three workloads that exhibit more rapid and complex evolution and contain time-related parameters operating at finer time scales. Therefore, using ML-based predictive models is essential to handle evolving workloads.
Figure 3. Accuracy of SIBYL-LSTM compared with other models in predicting queries for the next time interval.

SIBYL continuously learns to adapt to changing workload patterns and maintains high accuracy with minimal adjustments. As shown in Figure 4, the model fine-tuned in just 6.4 seconds to reach 95% accuracy, closely matching the initial accuracy of 95.4%.

The figure consists of two parts, a and b. (a) shows the pattern change of the parameters of the telemetry workload. The y-axis represents the query arrival time, and the x-axis shows the parameter value. The pattern shift starts on May 13 (highlighted in light blue), and Sibyl detects this by observing a drop in accuracy. The model accuracy for the shifted pattern is 51.9%, which falls below the threshold 𝛼 = 75%, triggering model fine-tuning. Figure 11 (b) shows that Sibyl fine-tunes the Sibyl-LSTM by incrementally training it on newly observed data instead of training it from scratch. The y-axis represents the recall, and the x-axis shows the number of epochs. The figure shows that the model converges in just 2 epochs, with an overhead of 6.4 seconds, and the accuracy improves to 95.0%, close to the pre-trained accuracy of 95.4%.
Figure 4. Fine-tuning results for changing telemetry workloads.

To address the poor performance of our dashboards, we used SIBYL to create and test materialized views (special data structures that make queries run faster). These views identify common tasks and recommend which ones to pre-save to speed up future queries.

SIBYL was trained using 2,237 queries from anonymized Microsoft sales data over a 20-day period, allowing it to create next-day materialized views. Using historical data improved query performance by 1.06x, and SIBYL's predictions improved by 1.83x, demonstrating that SIBYL's ability to predict future workloads can significantly improve database performance.

Meaning and Future Outlook

SIBYL's ability to predict dynamic workloads can be used for many purposes beyond improving materialized views. It can help organizations scale resources efficiently and reduce costs. It can also improve query performance by automatically organizing data to ensure the most frequently accessed data is always available. In the future, we plan to integrate more machine learning techniques to make SIBYL even more efficient, reduce the effort required to set it up, and improve the way the database handles dynamic workloads, making the database faster and more reliable.

Acknowledgements

We would like to thank our co-authors on the paper, Jyoti Leeka, Alekh Jindal and Jishen Zhao, for their valuable contributions and efforts.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *