
Images by the author | Canva
If you like building machine learning models and trying out new ones, it's really cool, but honestly, once you're available it will only be useful to others. To do this, it must be published via the web API so that other programs (or humans) can send data and regain predictions. That's where the REST API comes in.
In this article, you'll learn how to go with Fastapi in just under 10 minutes from a simple machine learning model using Fastapi, one of the fastest and most developer-friendly web frameworks in Python. And, while not just stopping with the “Make It Run” demo, I'll add something like this:
- Verifying incoming data
- Record all requests
- Add background tasks to avoid slowing down
- Elegantly handle errors
So, before moving on to the code parts, let's quickly show you what the project structure looks like.
ml-api/
│
├── model/
│ └── train_model.py # Script to train and save the model
│ └── iris_model.pkl # Trained model file
│
├── app/
│ └── main.py # FastAPI app
│ └── schema.py # Input data schema using Pydantic
│
├── requirements.txt # All dependencies
└── README.md # Optional documentation
Step 1: Install what you need
This project requires several helpers such as Fastapi API, Scikit-Learn for models, Joblib and Pydantic. It can be installed using PIP.
pip install fastapi uvicorn scikit-learn joblib pydantic
And save your environment:
pip freeze > requirements.txt
Step 2: Train and save a simple model
Keep the machine learning part simple and allow you to focus on delivering your models. I use the famous one IRIS Data Set Training a Random Forest Classifier Predict iris flower types based on petal and cepal measurements.
This is the training script. Create a file called train_model.py in Model/ directory:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib, os
X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier()
clf.fit(*train_test_split(X, y, test_size=0.2, random_state=42)[:2])
os.makedirs("model", exist_ok=True)
joblib.dump(clf, "model/iris_model.pkl")
print("✅ Model saved to model/iris_model.pkl")
This script loads data, splits it, trains the model, and saves it using Joblib. Run it once to generate the model file.
python model/train_model.py
Step 3: Define the inputs the API should expect
Next, you need to define how users interact with the API. What should they send and in what format?
Use Pydantic, a built-in part of Fastapi, to create a schema that describes and validates incoming data. Specifically, it allows the user to provide four positive float values for the sepal length/width and petal length/width.
New file app/schema.pyaddition:
from pydantic import BaseModel, Field
class IrisInput(BaseModel):
sepal_length: float = Field(..., gt=0, lt=10)
sepal_width: float = Field(..., gt=0, lt=10)
petal_length: float = Field(..., gt=0, lt=10)
petal_width: float = Field(..., gt=0, lt=10)
Here we added value constraints (beyond 0 and below 10) to keep inputs clean and realistic.
Step 4: Create an API
Now, it's time to build an actual API. I use Fastapi.
- Load the model
- Accepts JSON input
- Predict class and probability
- Record requests in the background
- Returns a clean JSON response
Write the main API code inside app/main.py:
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import JSONResponse
from app.schema import IrisInput
import numpy as np, joblib, logging
# Load the model
model = joblib.load("model/iris_model.pkl")
# Set up logging
logging.basicConfig(filename="api.log", level=logging.INFO,
format="%(asctime)s - %(message)s")
# Create the FastAPI app
app = FastAPI()
@app.post("/predict")
def predict(input_data: IrisInput, background_tasks: BackgroundTasks):
try:
# Format the input as a NumPy array
data = np.array([[input_data.sepal_length,
input_data.sepal_width,
input_data.petal_length,
input_data.petal_width]])
# Run prediction
pred = model.predict(data)[0]
proba = model.predict_proba(data)[0]
species = ["setosa", "versicolor", "virginica"][pred]
# Log in the background so it doesn’t block response
background_tasks.add_task(log_request, input_data, species)
# Return prediction and probabilities
return {
"prediction": species,
"class_index": int(pred),
"probabilities": {
"setosa": float(proba[0]),
"versicolor": float(proba[1]),
"virginica": float(proba[2])
}
}
except Exception as e:
logging.exception("Prediction failed")
raise HTTPException(status_code=500, detail="Internal error")
# Background logging task
def log_request(data: IrisInput, prediction: str):
logging.info(f"Input: {data.dict()} | Prediction: {prediction}")
Let's pause and understand what's going on here.
Loads the model once when the app launches. When a user gets a hit / Predict Using the endpoint using valid JSON input, convert it to a numpy array, pass it to the model, and return the predicted class and probability. If something goes wrong, log it and return a friendly error.
Please be careful about BackgroundTasks Part – This is a neat Fastapi feature that can work after a response has been sent (such as saving logs). This keeps the API responding and avoids delays.
Step 5: Run the API
To start the server, use uvicorn as follows:
uvicorn app.main:app --reload
Visit: http://127.0.0.1:8000/docs
You will see an interactive Swagger UI that lets you test the API.
Try this sample input:
{
"sepal_length": 6.1,
"sepal_width": 2.8,
"petal_length": 4.7,
"petal_width": 1.2
}
Alternatively, you can use Curl to make a request such as:
curl -X POST "http://127.0.0.1:8000/predict" -H "Content-Type: application/json" -d \
'{
"sepal_length": 6.1,
"sepal_width": 2.8,
"petal_length": 4.7,
"petal_width": 1.2
}'
Both produce the same response.
{"prediction":"versicolor",
"class_index":1,
"probabilities": {
"setosa":0.0,
"versicolor":1.0,
"virginica":0.0 }
}
Optional Step: Expand API
You can deploy the Fastapi app as follows:
- render.com (Zero Config Deployment)
- Railway.App (for continuous integration)
- Heroku (via Docker)
You can also extend this to production-enabled services by adding authentication (such as API keys and OAUTH) to protect your endpoints, monitoring requests with Prometheus and Grafana, and using Redis or Celry for your background job queue. You can also refer to my article: Step-by-step guide to deploying machine learning models using Docker.
I'll summarize
That's it – and it's already better than most demos. What we made is more than just an example of toys. But that:
- Automatically validate input data
- Returns meaningful responses with prediction reliability
- Log all requests to a file (api.log)
- Using background tasks, the API remains fast and responsive
- Gorgeously deal with obstacles
All of that with less than 100 lines of code.
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a deep passion for the intersection of data science, AI and medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT.” As APAC's Google Generation Scholar 2022, she advocates for diversity and academic excellence. She is also recognized as Teradata diversity for technology scholars, MITACS Globallink Research Scholar and Harvard Wecode Scholar. Kanwar is an avid advocate for change and has founded a fem code to empower women in the STEM field.
