Don’t Be Fooled By Data Drift « Machine Learning Times

Machine Learning


If you search for information about ML monitoring online, you are likely to come across various monitoring approaches that advocate putting data drift at the center of your monitoring solution.

While data drift detection is certainly an important component of a healthy monitoring workflow, we’ve found that it’s not the most important component. Data drift and its sibling, target, and prediction drift can misrepresent the state of your ML model in production.

The purpose of this blog post is to show that not all data drift affects model performance. Drift methods tend to generate a large number of false alarms, making them difficult to rely on. To illustrate this point, we train an ML model using a real-world dataset, monitor the model’s feature distribution in production, and report any data drift that may occur.

We then present a new algorithm invented by NannyML that significantly reduces these false alarms.

So, without further ado, let’s take a look at the dataset used in this post.

Power consumption dataset

We will use the Tetouan City Electricity Consumption dataset, which is a real open source dataset. The data was collected by the Supervisory Control and Data Acquisition System (SCADA) of Amendis, a public service operator responsible for distribution of drinking water and electricity in Morocco.

Click here to continue reading this article.





Source link

2 thoughts on “Don’t Be Fooled By Data Drift « Machine Learning Times

  1. This is such a valuable viewpoint on data drift in ML monitoring. It’s eye-opening to learn about false alarms and the optimized algorithm from NannyML. I use https://nano-video.io to create tutorial videos to share these practical machine learning insights with peers.

Leave a Reply

Your email address will not be published. Required fields are marked *