Three eras of data science: when to use traditional machine learning, deep learning, and LLM (illustrated in one example)

Machine Learning


Of the Universe (written by one of the most iconic singers of all time) says:

I wish I could go back
And things have changed in the last few years
i am experiencing change

Black Sabbath – Changes

This song is incredibly powerful and talks about how quickly life can change right before your eyes.

That song is about heartbreak and a love story. But it also reminds me of the many changes my work as a data scientist has undergone over the past 10 years of my career.

  • When I started studying physics, the only thing that came to mind when I heard the word “Transformers” was Optimus Prime. What machine learning means to me Linear regression, SVM, random forest etc… [2016]
  • When I completed my master’s degree in Big Data and Complex Systems Physics, I first learned thatbartThe first GPT models appeared and although no one expected them to be as powerful as they are today, they looked very interesting. [2018-2020]
  • Let’s take a look back at my life as a full-time data scientist. Today, I’ll explain what GPT is for those who don’t know what it stands for or have never read it. “All you need is attention.” Your chances of passing a data science system design interview are slim to none. [2021 – today]

People say that the tools we use to work with data and our daily lives are vastly different than they were 10 (or even 5) years ago, and I agree. what i disagree with is the idea that just because everything seems solvable today with GPT, LLM, or Agentic AI, the tools used in the past should be erased.

The purpose of this article is to consider one task: Classify likes/dislikes/neutral intentions of tweets. in particular, Traditional machine learning, deep learning, and Large scale language models.

We’ll do this hands-on using Python and explain why and when to use each approach. After reading this article, we hope you learned the following:

  1. of tool Those used in the early days should still be considered, studied, and sometimes adopted.
  2. Latency, accuracy, cost Must be evaluated when choosing the best algorithm for your use case
  3. Changes These are necessary in the data scientist world and should be embraced without fear 🙂

Let’s get started!

1. Usage example

The case we are dealing with is one that is commonly employed in data science/AI applications in practice. sentiment analysis. This means that given a text, we want to infer the “emotion” behind the author of that text. This is very useful if you want to collect feedback behind a specific review of an object, a movie, an item you’re recommending, etc.

This blog post uses a very “famous” sentiment analysis example. Classifying the sentiment behind tweets. Because we wanted more control, we did not handle organic tweets collected from the web (where the label was unknown). Instead, use content generated by . large language model Things we can control.

This technique also allows you to adjust the difficulty and variety of problems, and see how different techniques respond.

  • easy case: Love tweets sound like postcards, hate tweets are blunt, and neutral messages talk about the weather and coffee. If your model is having problems here, then something else is wrong.
  • hard case: Still love, hate, neutral, but now injected with sarcasm, mixed tones, and subtle hints that require attention to context. we also have few This is because the data is used to reduce the dataset used for training.
  • Additional hard case: To move into the five emotions of love, hate, anger, disgust, and envy, the model needs to parse richer, more hierarchical sentences. Additionally, there are 0 entries for training data. Training is not possible.

I generated the data and placed each file in a specific folder in the public GitHub folder I created for this project. [data].

Our goal is to build a smart classification system that can be understood efficiently. emotions Behind the tweet. But how do we do it? Let’s understand that.

2. System design

Here is a diagram that is always very helpful to consider:

Image created by the author

accuracy, latencyand scale Machine learning systems form triangles. Only two can be fully optimized at the same time.

You can also create very accurate models that scale very well to millions of entries, but they are not fast. Although it is possible to create a simple model that accommodates millions of entries, it is not very accurate. It allows you to create accurate and fast models, but it is not very scalable.

Although these considerations are abstracted from your specific problem, they can help guide which ML system design you build. Let’s get back to this story.

Also, the power of the model should be proportional to the size of the training set. In general, we try to avoid reducing the error in the training set at the expense of increasing the test set (the famous overfitting).

Image created by the author

We don’t want to get into the realm of underfitting or overfitting. Let me explain why.

Simply put, underfitting occurs when a model is too simple to learn the actual patterns in the data. It’s like trying to draw a straight line inside a spiral. Overfitting is the opposite. The model learns well on all the noisy training data, so it performs well on data it has already seen, but performs poorly on new data. The sweet spot is the middle ground, where the model can be understood without having to memorize the structure.

I’ll come back to this again.

3. Simple case: traditional machine learning

Let’s start with the most obvious scenario. A highly structured dataset of 1,000 generated and labeled tweets. The three classes (positive, neutral, and negative) are intentionally balanced, the language is very explicit, and every row exists within a clean CSV.

Let’s start with a simple import block of code.

Let’s see what the dataset looks like.

Image created by the author

Now, this is expected does not scale For millions of rows (because the dataset is too structured to be diversified). However, a very quick and accurate method can be constructed for this small and specialized use case. Let’s start with modeling. Three main points to consider:

  1. we are doing Training/test split Use 20% of the dataset in the test set.
  2. use. TF-IDF An approach to obtaining word embeddings. TF-IDF stands for Term Frequency-Inverse Document Frequency. This is a classic technique for converting text into numbers by giving each word a weight based on its importance within the document compared to the entire dataset.
  3. We combine this technique with two ML models. Logistic regression and support vector machinefrom scikit-learn. Logistic regression is simple, easy to interpret, and is often used as a powerful baseline for text classification. Support vector machines focus on finding optimal boundaries between classes and typically perform very well when the data is not too noisy.

And the performance is basically perfect for both models.

Image created by the author

In this very simple case with a consistent dataset of 1,000 rows, a traditional approach can get the job done. You don’t need a multi-billion parameter model like GPT.

4. Difficult case: deep learning

The second dataset is still synthetic, but it is intentionally designed to be annoying. While the labels remain love, hate, and neutral, the tweets rely on sarcasm, mixed tones, and backhanded compliments. Additionally, the training pool is smaller, but the validation slice remains large, so the model works with less evidence and more ambiguity.

Now that we understand this ambiguity, we need to take out our bigger guns. Even in cases like this, there are deep learning embedded models that maintain strong accuracy and can scale well (remember triangles and error vs. complexity plots). In particular, deep learning embedding models learn the meaning of words from their context, rather than treating them as isolated tokens.

This blog post uses one of the most popular embedding models, BERT. First let’s import some libraries.

…and some helpers.

Thanks to these functions, we can quickly evaluate embedded models and TF-IDF approaches.

Image created by the author

As you can see, the TF-IDF model performs very poorly on positive labels, but maintains high accuracy when using the embedding model (BERT).

5. Extra Hard Case: LLM Agent

Okay, now let’s do something very difficult.

  1. what we have is 100 lines.
  2. assume we don’t know the labelThat is, you cannot train machine learning models.
  3. we have Five Labels: envy, hatred, love, disgust, anger.

I can’t train anything, but I still want to perform classification, so I have to adopt a method that somehow already includes classification. large language model is the largest example of such a method.

Note that using LLM for the other two cases would be like shooting a fly with a cannon. But here, it makes perfect sense. The task is difficult, and since you can’t train the model (there’s no training set), there’s no way to do anything sensible.

In this case, large-scale accuracy is obtained. However, the API takes time, so you may have to wait a second or two for a response (remember the triangle).

Let’s import some libraries.

This is a classification API call.

And we can see that LLM does an amazing classification job.

6. Conclusion

Over the past decade, the role of the data scientist has changed as dramatically as the technology itself. While this may lead to the idea of ​​using the most powerful tools out there, that is often not the best approach.

Rather than reaching for the largest model first, we tested a single problem through the simple lens of accuracy, latency, and cost.

In particular, we will show you step by step what we did.

  • We defined our use case as tweet sentiment classification, aiming to detect love, hate, or neutral intent.. We designed three datasets of increasing difficulty: a clean dataset, a sarcastic dataset, and a zero-training dataset.
  • I used TF-IDF with logistic regression and SVM to tackle a simple case. The tweets were clear and direct, and both models worked almost perfectly.
  • We moved on to difficult cases where sarcasm, mixed tones, and subtle context made the task more complex.. We used BERT embeddings to capture meaning beyond individual words.
  • Finally, in a special hard case with no training data, we used a large-scale language model to directly classify emotions through zero-shot learning.

At each step, we showed how the appropriate tool depends on the problem. Traditional ML is fast and reliable when your data is structured. Deep learning models help when the meaning is hidden between the lines. LLM is powerful when there are no labels or when broad generalization is required.

7. Before you go!

Thank you again for your time. It means so much ❤️

My name is Piero Paialunga, and I’m the guy here.

Image created by the author

I’m from Italy and have a Ph.D. from University of CincinnatiI am working as. Trade Desk Data Scientist in New York City. I will write about AI, machine learning, and the evolving role of data scientists Both here at TDS and on LinkedIn. If you liked this article and want to learn more about machine learning and follow my research, you can:

A. Please follow me linkedinpublish all stories
B. Follow me GitHubyou can see all my code here
C. If you have any questions, please email us at: piero.paialunga@hotmail



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *