A Beginner's Guide to Machine Learning Testing with DeepChecks

Image courtesy of the author | Canva

DeepChecks is a Python package that provides a variety of built-in checks to test issues such as model performance, data distribution, and data integrity.

In this tutorial, you will learn about DeepChecks and use it to validate a dataset, test a trained machine learning model, and generate a comprehensive report. You will also learn how to test the model on specific tests instead of generating a full report.

Why do we need machine learning testing?

Machine learning testing is essential to ensure the reliability, fairness, and security of AI models. It validates model performance, detects bias, strengthens security against adversarial attacks especially in Large Language Models (LLMs), ensures regulatory compliance, and enables continuous improvement. Tools like Deepchecks provide comprehensive testing solutions that address all aspects of AI & ML validation from research to production, making them extremely useful in developing robust and reliable AI systems.

Get started with DeepChecks

In this getting started guide, you will load a dataset and run data integrity tests, a critical step that ensures that your dataset is trustworthy and accurate, paving the way for successful model training.

First, install the DeepChecks Python package using the `pip` command:

!pip install deepchecks --upgrade

Import important Python packages.
Using the pandas library, we load a dataset consisting of 569 samples and 30 features. The cancer classification dataset is taken from digital images of fine needle aspirates (FNA) of breast masses, where each feature represents a characteristic of the cell nuclei present in the image. These features can predict whether the cancer is benign or malignant.
Split the dataset into training and testing using the target column 'benign_0__mal_1'.

import pandas as pd
from sklearn.model_selection import train_test_split

# Load Data
cancer_data = pd.read_csv("/kaggle/input/cancer-classification/cancer_classification.csv")
label_col="benign_0__mal_1"
df_train, df_test = train_test_split(cancer_data, stratify=cancer_data[label_col], random_state=0)

Create the DeepChecks dataset with additional metadata, leaving the argument empty as the dataset does not have categorical features.

from deepchecks.tabular import Dataset

ds_train = Dataset(df_train, label=label_col, cat_features=[])
ds_test =  Dataset(df_test,  label=label_col, cat_features=[])

Run data integrity tests on the training dataset.

from deepchecks.tabular.suites import data_integrity

integ_suite = data_integrity()
integ_suite.run(ds_train)

It will take a few seconds to generate the report.

The Data Integrity Report includes the following test results:

Correlation between features
Correlation between features and labels
A single value in a column
special character
Mixed Null
Mixed Data Types
String mismatch
Data duplication
String length is out of range
Conflicting labels
Outlier Sample Detection

Testing Machine Learning Models

Train your model and run the model evaluation suite to learn more about how your model is performing.

Load the required Python packages.
We build three machine learning models: logistic regression, random forest classifier, and Gaussian NB.
We ensemble them using a voting classifier.
Fit the ensemble model to the training dataset.

from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

# Train Model
clf1 = LogisticRegression(random_state=1,max_iter=10000)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()

V_clf = VotingClassifier(
    estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
    voting='hard')

V_clf.fit(df_train.drop(label_col, axis=1), df_train[label_col]);

Once the training phase is complete, we run the DeepChecks model evaluation suite using the training and test datasets and models.

from deepchecks.tabular.suites import model_evaluation

evaluation_suite = model_evaluation()
suite_result = evaluation_suite.run(ds_train, ds_test, V_clf)
suite_result.show()

The model evaluation report includes the following test results:

Unused features – training dataset
Unused features – test dataset
Train test performance
Predictive Drift
Simple model comparison
Model inference time – training dataset
Model inference time – Test dataset
Confusion Matrix Report – Training Dataset
Confusion Matrix Report – Test Dataset

There are other tests provided in the suite that were not run due to the ensemble type of model, and if you had run a simpler model like logistic regression, a full report would likely have been produced.

If you want to use the model evaluation report in a structured format, you can always use the `.to_json()` function to convert the report to JSON format.

Additionally, you can save this interactive report as a web page. .save_as_html() function.

Running a Single Check

If you don't want to run the entire suite of model evaluation tests, you can also test your model with a single check.

For example, you can check for label drift by providing a training and testing dataset.

from deepchecks.tabular.checks import LabelDrift
check = LabelDrift()
result = check.run(ds_train, ds_test)
result

The result is a distribution plot and a drift score.

You can also extract the drift score value and methodology.

{'Drift score': 0.0, 'Method': "Cramer's V"}

Conclusion

The next step in your learning is to automate your machine learning testing process and track performance, which you can do with GitHub Actions by following the Deepchecks for CI/CD guide.

In this beginner course, you learned how to use DeepChecks to generate data validation and machine learning evaluation reports. If you have issues running the code, we encourage you to refer to the DeepChecks Kaggle Notebook of machine learning tests and run them yourself.

Abid Ali Awan (@1abidaliawan) is a Certified Data Scientist professional who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on Machine Learning and Data Science techniques. Abid holds a Masters in Technology Management and a Bachelors in Communication Engineering. His vision is to build AI products using Graph Neural Networks for students suffering from mental illness.

Source link