Data Science, Artificial Intelligence
Three years after revealing comets for data science, I am revisiting the timeless challenges of machine learning. It is now amplified in the age of generator AI.
It has been three years since I published my first book, Comet for Data Science. At the time, machine learning was the hottest topic in AI. Today, it almost feels like old news. Wherever you look, people are talking about generative AI: large-scale language models, image generators, and copilots. It's easy to think of traditional machine learning as being left behind.
But here is the truth. The challenges we faced with machine learning are still here, and they are similar to generative AI. Data quality, model reliability, overfitting, explainability… these are not relics of the past. They are the foundation on which the latest generation AI systems are built.
That's why I decided to reconsider the challenges outlined in Chapter 8 of the Comets of Data Science and reconstruct them through the lens of today's generative AI revolution. Because while the buzzwords change, the difficult questions remain, which is bigger, troublesome, and more urgent.
This post explains the timeless challenges of machine learning, from data headaches to modeling explanability, and shows how they map into today's generative AI world.
If there is one truth that hasn't changed from traditional machine learning to generative AI, then that's this: your model is as good as your data. In fact, the challenges we struggled with in ML are even more important today, including inadequate quantities, poor quality, lack of representation, and data drift. Unpack each one in order.
1. The amount of data is insufficient
In the classical ML world, small datasets often meant models that were not generalizable. Regression models trained with hundreds of columns of data were unable to capture the complexity of actual phenomena.
In Generation AI, the problem looks different, but it hasn't disappeared. The basic model is trained with billions of tokens, but the moment you move to a specialized domain such as medicine, legal, finance, manufacturing, etc., you will quickly see a lack of relevant high-quality domain data. Fine-tuning large models with thousands of examples often leads to overfitting or instability. The scale has changed, but the issue of rarity persists. The community needs to investigate this issue further.
2. Low data quality
Traditional ML practitioners fear messy data: replicas, outliers, missing values, or inconsistent labels can ruin performance. Cleaning data was often the most time-consuming part of the project.
Generation AI has higher wagers. Although the training dataset is so large that manual cleaning is not possible, the web-scale corpus used to build LLMs is filled with spam, toxic language, bias, and complete misinformation. A CSV incorrect label column can damage small classifiers. A flood of low-quality or harmful content within a large dataset can distort the behavior of the model in a subtle, difficult-to-detect, and very reversed manner after the model is trained.
The problem of poor data quality is mapped to model fine tuning. This can lead to false results while responding to domain-specific questions and hallucinations.
3. Non-representative data
Biasing training data has always been a challenge. Face recognition models trained primarily with bright skinned faces do not work unfairly in dark skin individuals. In ML, this was already a matter of fairness and accuracy.
In genai, the problem is amplified. Disproportionately trained language models with English content struggle with underrated language, dialect, or cultural contexts. Worse, these models are deployed at a large scale, so their bias is not just statistical errors. They can reinforce stereotypes, exclude communities, and shape social discourse. Representation is no longer a technical concern. It's ethical.
4. Data Drift
Data drift occurs when the world changes, but no training data occurs. This problem has long been an enemy of the production ML system. Credit scoring models built on data from five years ago may not reflect current economic realities.
For generation AI, drift is exponential. Language, culture, facts and knowledge evolve every day. LLM trained last year is already outdated, with a lack of current events, new scientific discoveries, or changes in cultural norms. Users expect these systems to be up to date, but huge models that retrain or tweak can be extremely costly. Genai drifting is more than just a performance issue. It's a matter of trust.
Data is only half of the battle. When you start building your model, a new set of challenges emerge. In traditional machine learning, these were often revolved around overfitting, underperformance, or computational costs. The same problem still applies in Generation AI.
1. Over-fitting and under-fitting
In classic ML, overfitting was usually easy to find. A model that was brilliantly executed on the training set failed with new data. On the other hand, poor health meant that models were too easy to capture patterns in the data.
Generate AI introduces a new shade of this problem. Overfitting LLM fine-tuning leads to models that “parrot” training data, memorizing the entire chunk of text and recreating sensitive information for words. Under-fitting, on the other hand, appears as a model that ignores domain-specific fine-tuning and returns to general pre-trained behavior. The line between a well-generated model and a model that remembers or ignores training data is now blurry.
2. Low performance
In traditional ML, poor performance often means inadequate accuracy, accuracy, or recall. The solution was to adjust the hyperparameters, try different algorithms, and engineer better features.
With Generated AI, measuring performance is much more complicated. What does “precision” mean when a model generates free-form text, images, or code? Evaluating creativity, relevance, or factual accuracy is subjective in nature. Still, performance is deeply important. “Hazing” the answer can be dangerous under medicines and laws, but generative imaging models that overlook subtle details can undermine trust. The challenge is to define metrics that measure success, not only improve the model.
3. Calculation cost and efficiency
Training large models and performing large cross-validation was expensive in ML, but it was easy to manage for most practitioners with the right infrastructure. Parallelization, GPUs and cloud resources helped to reduce costs.
In genai, the cost is astronomical. Training basic models can reach millions of dollars in calculation, energy and engineering time. Even large tweaks or running reasoning can overwhelm small teams. Efficiency is not just about having it. It is often a determinant of whether a system can be deployed. Although methods such as parameter-efficient fine-tuning (LORA, adapter) and model distillation have emerged, challenges remain. How do you balance performance, sustainability and accessibility?
4. Concept drift
Concept drift is when the relationship between input and output changes over time. This problem has long plagued ML in production. For example, consumer behavior changes and older models experience lower predictions.
In Generation AI, concept drift takes a new form. Language evolves, cultural references change, and the “earthly truth” of fact changes every day. A chatbot trained in 2022 may not be able to understand 2024 memes, slang, or news. Worse, users assume that the Genai system “knows everything” and make the outdated response more problematic. Unlike classic ML models that can be retrained regularly, large-scale basic model updates are far from trivial and often economically unfeasible.
If data issues are the main What's in it? The model's challenges are main How to learn the systemthe explanation possibility is about What comes out And how do we understand that? Traditional ML has helped us to understand the impact of individual features by promoting interpretability, leading to tools such as Shap and Lime. With generative AI, the problem is deeper and more urgent. How do you explain the behavior of a model that generates a whole paragraph of text, complex images, or behavioral code?
1. Black box problems, amplification
You can visualize the decision tree. Linear regression can be read like an equation. Even neural networks can be probed with distinctive attribution, although opaque. However, in LLMS and multimodal models with hundreds of millions of parameters, the internal mechanisms go beyond human understanding.
This opacity is important. When the generative model hallucinates, where does the error come from? Was it training data, fine tuning sets, and decoding strategies? The scale and complexity of these systems makes it almost impossible to answer with confidence.
2. The importance of characteristics and emergency actions
In classical ML, explanability often meant measuring how much each feature contributed to prediction. At genai, “characteristics” are not just age, salary, and number of words. They are embeddings that span a huge parameter space. What comes from those embeddings is not neat mapping, but behaviors like creativity, reasoning, style, bias.
Trying to return these emergency properties to a specific input is like trying to explain the novel by analyzing the frequency of that character. Beyond functional attribution, there is a need for a new form of interpretability that focuses on understanding patterns of behavior at scale.
3. Trust, accountability, regulation
Explanability has always been linked to trust. Users are more likely to accept the output of the model when they understand how to produce them. Regulators frequently find themselves in important domains such as finance and healthcare. Need it explanation.
Generator AI raises interests. Models that generate incorrect classifications can cause inconvenience. False medical advice, misleading legal arguments, or models that produce biased images can cause real harm. Explanation possibilities here are not optional. It is essential for safety, compliance and social trust. However, our current tools are far behind technology needs.
4. Towards a new paradigm of explanability
The community is experimenting with new approaches. Investigate models in synthetic testing, use smaller, interpretable models to approximate the behavior of LLMs, and design assessment frameworks that measure bias, toxicity, or fact. But these are just the first steps. The explanability of the generation AI is to find a completely new ratio of the interpretability that it means in a system. Create.
Three years ago, when I first wrote about machine learning, these challenges (decision of messy data, vulnerable models, black box decisions) felt like a central obstacle to building intelligent systems. Fast forward to today, the world is bustling about generative AI. Scale explodes, architecture evolves, and possibilities seem endless. Still, the challenges remain.
Data challenges Bigger than ever: niche domain rarity, sea of low-quality web texts, cultural bias, constant drift.
Model challenges Increased: Excessive attitudes now means memorization on a large scale, performance is difficult to measure, costs are incredible, and concept drifting takes place faster than ever before.
Explanability challenges Existential: We have moved from interpreting decision trees to trying to understand the urgent action of the 10 billion parameter model.
Generic AI may feel like a revolution, but it is also a continuation of the same journey we started with machine learning. Core lessons have not been changed: Intelligent systems are as powerful as the data we learn, the models we build, and the trust we can put in its output..
Reexamining these timeless challenges through the lens of Genai is a reminder that progress is not just about bigger models and flashy demonstrations. It's about solving complex, unattractive problems that have always defined AI. Make it reliable, fair, efficient and understandable.
Ultimately, it is the same whether it is called ML or Genai. Can you trust this system? Can you understand? And can you use it responsibly?
