Machine learning in production? What does this really mean?

Whether you’re a manager, data scientist, engineer, or product owner, you’ve almost certainly been in at least one meeting where the discussion revolved around “putting a model into production.”

But seriously, what does production even mean?

As you probably know, I’m an AI engineer. I started my first data science job in 2015 at a large company in the French energy sector. At the time, we were one of the first companies to build AI applications for energy management and production (nuclear, hydropower, renewable energy). And if there is one area where the operational implementation of AI is highly regulated, it is energy, particularly nuclear power. This is closely tied to the nature of data and the fact that machine learning models cannot be easily pushed into existing environments.

Thanks to this experience, I learned early on that creating models in notebooks was just the tip of the iceberg. I also immediately started talking about production, not really knowing what that meant. For this reason, I’d like to share with you some of my clear views over the years about bringing machine learning projects into production.

But let’s stop for a moment and consider our main question.

What does production actually mean?

What’s behind this buzzword “production” can be difficult to read and understand. There are countless videos and articles about it on YouTube, but very few of them are actually applicable to real projects.

In trying to answer this, our views will converge by the end of this article, even though the methods we use to get to production differ in each context.

Main definition

In the context of machine learning, production means that the output of the model directly impacts the user or product.

That impact can take many forms, such as educating someone, helping them make decisions, or enabling them to do something they couldn’t do before. It also means adding functionality to the shopping app’s recommendation system.

A program containing machine learning algorithms that is used by an end user or another product or application can be considered a model for a production environment.

Production not only has influence, but also responsibility. What I’m saying is that if no one or the system exists to fix the model when it’s wrong, the model will be deployed but never go into production.

There is a common belief that 87% of ML projects never reach the final stage of production. I’m not sure if that’s strictly correct, but my interpretation is simple. Many ML models never reach the stage where they actually impact users or products. And even if they do, they often don’t have systems in place to ensure long-term reliability, so they are simply deployed and accessible.

So, if we agree that producing means doing ML projects with impact and responsibility, how do we get there?

various aspects of production

To answer that, we need to accept that there are many facets to production. A model is just one component within a larger ETL pipeline.

This point is important.

We often imagine models as black boxes, where data is input, the math magic happens, and predictions are made. In reality, this is a gross oversimplification. In production, models are typically part of a broader data flow and are often more like data transformations than isolated decision-making engines.

Also, not all “production” looks the same, depending on the strength of the model in the final system.

In some cases, models may support decisions such as scores, recommendations, alerts, and dashboards.

In some cases, they make decisions such as automated actions, real-time blocks, and workflow triggers.

The difference is very important. If the system works automatically, the costs of mistakes are not the same and engineering requirements usually increase very quickly.

From my experience, most production systems can be categorized as follows:

→ production data storage systems; This means that all data is stored in a securely hosted file system or database in your production environment (cloud or on-premises).

→ Production of data acquisition part, This means having a system or workflow that connects to your production database and retrieves the data that will be used as input for your model. These workflows may include data preparation steps.

→ Push machine learning components to production, This is the part we are interested in. This means the model is already trained and requires a system that can run in the same environment as the other components.

These three parts clearly demonstrate that ML in production is not about the machine learning model itself, but everything around it.

However, other steps are often handled by different teams within the company, so let’s just focus on component 3, Push ML to Production.

Breakdown of the four steps

If I have a junior data scientist who needs to explain how this component works, I break it down like this:

Step 1: Function

Start with a trained model. The first thing you need is a function, the code that loads the model, accepts input data, performs predictions, and returns output.

At this stage, everything works locally. I get excited when I see my predictions for the first time, but I don’t want to stop there.

Practical details that are important in the early stages include not only predictability but also complete failure. In production, your functions will eventually receive strange inputs, missing values, unexpected categories, corrupted files, or out-of-range signals. Your future self will appreciate basic validation and clear error messages.

Step 2: Interface

To make this function available to others (without asking them to run your code), you need an interface (most likely an API).

Once deployed, this API accepts standardized requests with input data, passes them to prediction functions, and returns output. This allows other systems, applications, or users to interact with the model.

And this is the reality of production. The interface is not only technical, but also a contract. If another system expects /predict and exposes another, friction is guaranteed. The same applies if you change the schema every two weeks. When teams say, “The model is in production,” what they often really mean is, “We created the contracts that other people depend on.”

Step 3: Environment

Now we need portability. This means packaging your environment, code, API, and all dependencies so that you can run it elsewhere without changing your system.

If you’ve followed the steps so far, you’ve built a model, wrapped it in a function, and exposed it through an API. But that doesn’t mean anything if everything remains locked to the local environment.

More technical things are required here, such as reproducibility, version control, and traceability. It doesn’t necessarily have to be anything fancy, but just enough that if we introduced v1.2 today, we could explain what changed and why within three months.

Step 4: Infrastructure

The final step is to host everything somewhere your users or applications can actually access it.

In practice, this often means the cloud, but it could also be servers or edge infrastructure within the enterprise. The key is that what you build needs to be accessible, stable, and usable wherever you need it.

And this is where many teams learn a hard lesson. In production environments, the “best model” is often not the one with the best metrics in your notebook. It fits into real-world constraints, latency, cost, security, regulation, monitoring, maintainability, and sometimes simply “Can our team operationalize this?”

Step 5: Monitoring

Even if you have the cleanest API and the best infrastructure, it can still fail in production because problems aren’t caught early.

An unsupervised production model is basically already broken, we just don’t know it yet.

Monitoring doesn’t have to be complicated. You should know at least the following:

Is the service up and running and is the delay acceptable?
Does the input still look “normal”?
Is your data output drifting?
Is the business impact still meaningful?

In many real-world projects, large crashes do not degrade performance. Decaying quietly.

Having all these components in place will turn your model into something useful and impactful. Based on experience, Here are some practical guidelines.

For step 1 (function), Use tools you know (scikit-learn, PyTorch, TensorFlow). However, consider portability early. Using a format like ONNX makes future automation much easier. If you want to develop your own packages, you need to make sure you have the necessary software engineering or data engineering skills, whether you’re a manager or a data scientist. This is because building internal libraries and using off-the-shelf tools are completely different stories.

Step 2 (Interface) Frameworks like FastAPI work very well, but they always keep the consumer in mind. If another system expects /predict and exposes another, friction is guaranteed. You need to collaborate with stakeholders, and all the technical points about where the machine learning output goes should be very clear.

For step 3 (environment), This is where Docker comes into play. You don’t need to master everything right away, but you do need to understand the basics. Think of Docker as a box that lets you run everything you build almost anywhere. If you already have strong data engineering skills, this is fine. If not, you’ll need to build them or rely on someone on your team who has them.

For step 4 (Infrastructure), Constraints determine choices. Lambda, microservices, edge devices, and of course GPUs. ML workloads require specialized infrastructure, often through managed services like SageMaker.

Throughout every step, one life-saving rule is to always have an easy way to roll back. Production environments are not just about deploying, but also about recovering when faced with reality.

Don’t consider this step in your data science project as a milestone. It’s a series of steps and a shift in mindset. Rather than waiting to promote the most complex model, companies want you to build models that answer business questions or add expected functionality to a particular product. This model must be delivered to products and users and monitored so that people can trust it and continue using it.

Understanding the environment is very important. The tools mentioned earlier may vary depending on your team, but the methodology is the same. I’m sharing it only to give you a concrete idea.

You can build a great model, but it’s useless if no one uses it.

And once people use it, it becomes real and requires ownership, oversight, constraints, and systems around it.

Don’t let your work stay below 87%.

Note: Portions of this article were originally written in French and translated into English with the help of Gemini.

🤝 Stay Connected

If you enjoyed this article, follow me on LinkedIn for more honest insights about AI, data science, and careers.

👉 LinkedIn: SaberRice benDimerado

👉 Medium: https://medium.com/@sabrine.bendimerad1

👉 Instagram: https://tinyurl.com/datailearn

Source link