
In artificial intelligence projects there is a natural tension between technical rigor and business speed. Machine learning teams strive to be statistically complete models built on pristine datasets, often with untouched datasets. But what if a “sufficient” model deployed quickly can provide more business value than a perfect model that is way too slow?
This is not a theoretical question. Many organizations fall into the trap of pursuing data integrity at the expense of concrete results. The following case studies from the Retail Demand Forecast Project show how a practical and iterative approach can outweigh the traditional slow path to development.
Our team was tasked with building an inventory demand forecasting system for a retail chain with 50 stores. The company struggled with approximately $2 million in annual costs associated with excessive attacks. The goal was to build a system that could accurately predict the demand for 10,000 unique products, or SKUs, to help finance teams make smarter purchasing decisions.
To achieve this goal, the Data Science team has set accurate technical goals. They aimed for an average absolute percentage error (MAPE) of 5%. This metric means, on average, that the model's sales forecast is within 5% of actual sales. This seemed like a reasonable benchmark for a system aimed at guiding key financial commitments.
The project began in an extensive data preparation phase. Previous sales data has had common issues in real enterprise environments. It included missing values, inconsistent product classification, seasonal adjustments for unlaminated layers, and inventory counts that failed adjustments between different systems.
The team spent eight months focusing on solving these problems. They cleaned and standardized data formats and built a complex functional engineering pipeline. This process involved interviewing store managers to understand regional variations and manually adjust inventory discrepancies. The outcome of this intensive effort was a sophisticated model explaining the impact of local preferences, seasonal trends, and promotions. In a controlled testing environment, we achieved a MAPE of 6%, which is very close to the original target.
This comprehensive project was underway, but our product team began asking another question. What is the minimum performance required to deliver business value? We have analyzed our existing manual ordering process and found it to be very unreliable, especially for the most volatile and expensive SKUs.
This analysis has led to define the “regions of indifference.” We decided that a model that could simply outperform manual guesswork on the top 20% of the most expensive top 20% would save the company hundreds of thousands of dollars. A model with a 25% MAPE is not perfect, but it's a big win. In fact, we calculated that even a model with a MAPE of 30% or 40% would likely be sufficient to start offering value.
Armed with this perspective, we began a parallel effort. The data engineering team performed minimal cleaning of the data, focusing only on deleting obvious outliers, filling missing values with simple averages, and focusing on standardising the basic form. Within two weeks, we trained a simple baseline model. Mape was 22%, which is not accurate, but is better than it is. It quickly identified clear and practical patterns, such as consistently excessive categories and inconsistencies in product distribution across the region.
This baseline system has been deployed in five pilot stores. The results were immediate and important. In the first quarter, Pilot Stores saw an excess inventory drop by 25%. When extrapolated at all 50 stores, this represents a savings of around $500,000 a year. This value was realized while the “perfect” model was still several months away from completion.
Over the next six months, we have continuously improved this deployment model based on actual feedback. The significant improvements came from deeper collaboration with the finance team, understanding asymmetric error costs. I've learned that overattacking items can cost three times more money for a business.
Standard metrics like Mape treat these errors equally. We adjusted the model to reflect business reality. The loss function (the mathematical element that guides its learning) was modified to punish it heavier. This change, driven by business insights, had a much greater impact on reducing excessive costs than slight improvements in forecast accuracy.
This experience provides a clear framework for teams building AI products. The first principle is to define business value before pursuing technical integrity. The first focus should be on understanding the business baseline and minimum thresholds for meaningful improvement, rather than chasing any statistical target.
Once you know that threshold, your team can ship the product to learn. The fastest way to obtain high quality and relevant data is to deploy functional models into the real world. This process reveals edge cases, generates user feedback, and provides performance data that offline test sets cannot replicate.
Additionally, successful AI products align their optimization processes with business goals. They look beyond standard metrics to understand the financial outcomes of various model errors. By working with stakeholders to quantify these costs, teams can incorporate this business logic directly into the training of the model via its loss function, allowing them to optimize what is truly important.
Ultimately, this approach treats the quality of the data as the starting input, rather than an overcoming blocker. Instead of asking if the data is complete, here is a more productive question: What is the simplest model you can build to create value with today's data? Answering this question is key to unlocking business value faster.
