Monitor model performance in MLOps pipelines using Python

Machine Learning


Monitor model performance in MLOps pipelines using Python
Image by rawpixel.com on Freepik

Machine learning models are only useful when used in production to solve business problems. However, business problems and machine learning models are constantly evolving. Therefore, machine learning must be maintained to keep up with business KPIs. This is where the concept of MLOps came from.

MLOps (Machine Learning Operations) is a collection of techniques and tools for machine learning in production. Machine learning automation, version control, delivery, and monitoring are what MLOps handles. This article focuses on monitoring and how to use Python packages to configure monitoring model performance in production. Let’s get into it.

When we talk about monitoring in MLOps, we can mean many things as one of the principles of MLOps is monitoring. for example:

– Monitor changes in data distribution over time

– Monitor features used in development and production

– Monitor model decay

– Monitor model performance

– Monitor system age

There are many more things to monitor with MLOps, but this article will focus on monitoring model performance. In our case, model performance refers to the model’s ability to make reliable predictions from unseen data, measured by specific metrics such as accuracy, precision, and recall.

Why should you monitor model performance? Maintaining confidence in model predictions to solve business problems. Before going into production, we often calculate the model’s performance and its impact on KPIs. For example, if you want your model to continue to meet your business needs, your baseline is 70% accuracy, but anything less than that is unacceptable. So monitoring the performance ensures that the model always meets the business requirements.

Learn how model supervision works using Python. Let’s start by installing packages. There are many choices for model monitoring, but for this example we will use an open source package for monitoring called Obvious.

First you obviously need to install the package using the following code:

After installing the package, download the insurance claims data sample data from Kaggle. It also erases the data before using it further.

import pandas as pd

df = pd.read_csv("insurance_claims.csv")

# Sort the data based on the Incident Data
df = df.sort_values(by="incident_date").reset_index(drop=True)

# Variable Selection
df = df[
    [
        "incident_date",
        "months_as_customer",
        "age",
        "policy_deductable",
        "policy_annual_premium",
        "umbrella_limit",
        "insured_sex",
        "insured_relationship",
        "capital-gains",
        "capital-loss",
        "incident_type",
        "collision_type",
        "total_claim_amount",
        "injury_claim",
        "property_claim",
        "vehicle_claim",
        "incident_severity",
        "fraud_reported",
    ]
]

# Data Cleaning and One-Hot Encoding
df = pd.get_dummies(
    df,
    columns=[
        "insured_sex",
        "insured_relationship",
        "incident_type",
        "collision_type",
        "incident_severity",
    ],
    drop_first=True,
)

df["fraud_reported"] = df["fraud_reported"].apply(lambda x: 1 if x == "Y" else 0)

df = df.rename(columns={"incident_date": "timestamp", "fraud_reported": "target"})

for i in df.select_dtypes("number").columns:
    df[i] = df[i].apply(float)

data = df[df["timestamp"] < "2015-02-20"].copy()
val = df[df["timestamp"] >= "2015-02-20"].copy()

The above code selects some columns for model training purpose and converts them to numeric representation to split the data for reference (data) and current data (val).

MLOps pipelines require reference or baseline data to monitor model performance. Usually data that has been separated from training data (such as test data). It also requires current data or data not visible to the model (incoming data).

Let’s obviously use it to monitor data and model performance. Data drift affects model performance and can be monitored.

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(current_data=val, reference_data=data, column_mapping=None)
data_drift_report.show(mode="inline")

Monitor model performance in MLOps pipelines using Python

Apparently the package automatically displays a report about what happened to the dataset. This information includes dataset drift and column drift. In the example above, the dataset is not drifting, but two columns are drifting.

Monitor model performance in MLOps pipelines using Python

This report shows that the columns “property_claim” and “timestamp” indeed detected drift. This information should be used in the MLOps pipeline to retrain the model or explore the data further.

If you prefer, you can also retrieve the above data report in a log dictionary object.

data_drift_report.as_dict()

Now let’s train a classifier model from our data and monitor the performance of the model using Obvious.

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(data.drop(['target', 'timestamp'], axis = 1), data['target'])

Obviously, we need both the target and prediction columns in the reference and current datasets. Add model predictions to your dataset and use them to monitor performance.

data['prediction'] = rf.predict(data.drop(['target', 'timestamp'], axis = 1))
val['prediction'] = rf.predict(val.drop(['target', 'timestamp'], axis = 1))

As a caveat, it is recommended to have reference data, not training data from real cases, to monitor model performance. Let’s set up performance monitoring for our model using the following code:

from evidently.metric_preset import ClassificationPreset

classification_performance_report = Report(metrics=[
    ClassificationPreset(),
])

classification_performance_report.run(reference_data=data, current_data=val)

classification_performance_report.show(mode="inline")

Monitor model performance in MLOps pipelines using Python

As a result, we can see that the quality metric of the current model is lower than the reference (which is expected since we are using the training data for reference). Depending on your business requirements, the above metrics could be your next steps. Let’s take a look at what other information the report reveals.

Monitor model performance in MLOps pipelines using Python

The Class Representatives report shows the actual class distribution.

Monitor model performance in MLOps pipelines using Python

The confusion matrix shows how the predicted values ​​compare to the actual data in both the reference dataset and the current dataset.

Monitor model performance in MLOps pipelines using Python

A quality index by class shows the performance of each class.

As before, you can convert your classification performance reports to dictionary logs using the following code:

classification_performance_report.as_dict()

That’s all for now. A model performance monitor can obviously be set up with the MLOps pipeline you’re currently using, and it still works well.

Monitoring model performance is an essential task in the MLOps pipeline as it helps maintain how the model keeps up with business requirements. A Python package called Obvious makes it easy to set up a model performance monitor that can be integrated into your existing MLOps pipeline.

Cornelius Judah Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves sharing his Python and data tips through social and writing media.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *