How to automate machine learning workflows with just 10 lines of Python

It's magical – until you get stuck until you try to decide which model to use for your dataset. Should I use random forest or logistic regression? What if the naive Bayes model is better than both? For most of us, answering that means hours of manual testing, model building, and confusion.

But what if we could automate the entire model selection process?
In this article, you'll find simple and powerful Python automation that automatically selects the best machine learning model for your dataset. No deep ML knowledge or tuning skills are required. Connect the data and let Python do the rest.

Why automate ML model selection?

There are multiple reasons. Let's take a look at some of them. Think about it:

Most datasets can be modeled in multiple ways.
It takes time to try each model manually.
Choosing the wrong model early can lead to a project being derailed.

Using automation:

Instant comparisons of dozens of models.
Get performance metrics without repeating code.
Identify top performance algorithms based on accuracy, F1 score, or RMSE.

It's not just convenience, it's smart ML hygiene.

Library to use

Explore two underrated Python ML automation libraries. these are Lazy Predict and Picalet. You can install both of these using the following PIP commands:

pip install lazypredict
pip install pycaret

Import the required libraries

Now that you have installed the required libraries, let's import them. It also imports other libraries that load data and helps you prepare for modeling. You can import them using the code below:

import pandas as pd
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyClassifier
from pycaret.classification import *

Loading a dataset

Use freely available diabetes data sets. You can view this data from this link. Use the following command to download the data, save it in a data frame, and define x(feature) and y(outsome).

# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
df = pd.read_csv(url, header=None)

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

Use LazyPredict

Now that the dataset has been loaded and the required libraries have been imported, let's split the data into training and test datasets. Then, you'll eventually pass it on to LazyPredict to understand which model is the best for your data.

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# LazyClassifier
clf = LazyClassifier(verbose=0, ignore_warnings=True)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

# Top 5 models
print(models.head(5))

The output clearly shows that LazyPredict tries to fit the data to 20 or more ML models, and performance such as accuracy, ROC, AUC is chosen to choose the model that best suits your data. This will take longer to make decisions and make them more accurate. Similarly, you can create a plot of accuracy for these models to make it a more visual decision. You can also check times that you can ignore.

import matplotlib.pyplot as plt

# Assuming `models` is the LazyPredict DataFrame
top_models = models.sort_values("Accuracy", ascending=False).head(10)

plt.figure(figsize=(10, 6))
top_models["Accuracy"].plot(kind="barh", color="skyblue")
plt.xlabel("Accuracy")
plt.title("Top 10 Models by Accuracy (LazyPredict)")
plt.gca().invert_yaxis()
plt.tight_layout()

Use Pycaret

Now let's take a look at how Pycaret works. Create models using the same dataset and compare performance. Pycaret itself uses the entire dataset when it does the test train split.

The code below is:

Runs over 15 models
Evaluate them with cross-validation
Returns the best based on performance

Everything on two lines of code.

clf = setup(data=df, target=df.columns[-1])
best_model = compare_models()

As you can see here, Pycaret provides more information about the performance of your model. It may take a few seconds than LazyPredict, but to provide more information, you can make an informed decision about which model you want to proceed.

Real-life use cases

Some of the actual use cases where these libraries might be beneficial are:

Rapid prototyping of hackathons
Internal dashboard that proposes the best model for analysts
Teach ML without dying with syntax
Pre-test ideas before full-scale deployment

Conclusion

Using an Autol library like we discussed does not mean that you need to skip learning the mathematics behind the model. But in a fast-paced world, productivity is greater.

What I like about LazyPredict and Pycaret is its quick delivery of feedback loops. This allows us to focus on functional engineering, domain knowledge and interpretation.

If you are starting a new ML project, try this workflow. Save time, make better decisions and impress your team. Let Python lift heavily while building a smarter solution.

Source link