What does overfitting in machine learning mean?

Imagine you’re trying to impress your friends with a new card trick you just learned, but instead of learning a reusable trick, you’re memorizing the order of your entire card deck. Bravo! you are the star of the show. Until someone shuffles the deck and runs out of tricks.

This is, in a very simplistic way, the idea behind overfitting in machine learning.

People playing chess and thinking about their next move.

Image Source: Getty Images.

What is overfitting in machine learning?

Think of overfitting as a student who is too eager to outperform others in the classroom, learning all the peculiarities of the teacher’s test pattern but failing to understand the general principles of the subject. Similarly, overfitting in machine learning is when the algorithm tries too hard. It performs well on training data and fits perfectly like a glove. But when faced with new data (a real challenge!), it stumbles.

why? Because it also learned the noise, outliers, and random fluctuations in the training data. All of this helps the system to perfectly reconstruct the training data set, but can cause problems when applied to new situations and new data.

And this potential problem makes its presence felt in all aspects of machine learning. Developers must consider the risk of overfitting, whether building supervised, unsupervised, or semi-supervised learning systems, or designing neural networks for deep learning. All of these approaches can lead to poor results due to weak or unbalanced buckets in the training data.

So why should we care about overfitting?

Consider overfitting in the context of investing. Just because you’ve been on a winning streak lately doesn’t mean you’re going to put all your money into one stock, right? of course not. Because they understand the principles of variance and know that strong past results may not necessarily yield similar benefits in the future.

Overfitting refers to betting everything on one stock based on historical financial data and stock charts without considering recessions or other changing market conditions, new innovations, or the number of competitors. is equivalent to

In machine learning, algorithms bet everything on specific training data. The system will eventually build a predictive model that fits that data, but will not perform well when faced with information from outside that particular dataset.

This is important because the overfit model is a deceptive model. As long as your tests adhere to your training data, it might seem like you’re doing wonders. But in practice, its predictive power is as reliable as the magic 8-ball. Overfitting undermines a machine learning model’s ability to generalize lessons learned so that they can be applied to new inputs from never-before-seen datasets. In the world of machine learning, generalization is key.

How to avoid the quagmire of overfitting

Now I know why an overfit machine learning system is just as useful as a solar-powered flashlight that doesn’t require batteries. This works only when you need it most. So what can be done about it? After all, the inventors of modern machine learning systems have come up with some nifty techniques to get around this problem.

Cross-validation: This technique involves dividing the data set into many subsets. Models are trained on some of these subsets and validated on the remaining subsets. This process is repeated several times with different combinations to provide a robust estimate of the model’s performance on unseen data.
Early stop: This involves stopping the training process before the model starts overfitting. Basically, we monitor the model’s performance on the validation set during training and stop training when the performance starts to degrade. State-of-the-art machine learning models are not always the best.
drop out: This is a technique used especially in neural networks. This involves randomly “dropping out” or turning off certain neurons during training. The system must route the data flow through different digital neuron paths. This helps prevent unwanted focus on potentially misleading data relationships.
Data Augmentation: This involves applying transformations to existing data to create new synthetic training samples. For example, you might flip or rotate an image in a computer vision task. This increases the size and diversity of training data and improves model generalization. If a landscape tagging algorithm can distinguish between mountains and upside-down ocean waves, you might be on the right track.

Note that each of these techniques has its place and may be more or less effective depending on the specific nature of the data set and the problem at hand. There is no silver bullet to stop all potential problems, so a successful machine learning system will probably employ some overfitting prevention techniques.

Like most things in machine learning (or investing, or life), machine learning is as much an art as it is a science.

A story of overfitting from the trenches

before 2006 Netflix (NFLX 0.36%) announced a digital video streaming catalog as a free add-on feature to its Red DVD mailer service, and the company launched an ambitious data mining competition. The Netflix Awards taught the company many lessons over the next three years, but it wasn’t what the company wanted to learn in the first place.

The reason for this unexpected result was, of course, overfitting.

Of course, Netflix didn’t ignore the risk of overfitting. Competitors had access to 100 million movie ratings for 17,000 movies served by 480,000 Netflix subscribers. A further 3 million ratings were stored in a separate list, never seen directly by programmers. This was the cross-validation data set where the computing model was tested and scored against a clean data set. The immediate challenge was to devise a movie recommendation algorithm that could outperform Netflix’s existing system by at least 10%.

The winning team, BellKor’s Pragmatic Chaos, was a coalition of three elite performers who combined over 50 radically different analytical machine learning approaches. In second place were over 30 teams united under the self-descriptive name “The Ensemble”, with 48 sophisticated machine learning models under it.

Attempts involving only a few analytical approaches have been no match for these diverse giants. The top two teams tied for final scoring with a 10.06% improvement on him over Netflix’s own movie recommendation model. BellKor submitted the final entry 22 minutes ahead of him for The Ensemble and won.

But Netflix never adopted BellKor’s recommendation system. As you can imagine at this point, both BellKor and The Ensemble performed slightly worse in the validation round, proving that even the best machine learning systems cannot deliver truly data-independent predictive models. rice field.

Instead, the company was willing to spend the $1 million prize in exchange for many technical ideas and demonstrations of diverse analytical approaches that shatter monolithic techniques.

“If you look at the cumulative hours, you can get a PhD for $1 an hour,” then-CEO Reed Hastings told the New York Times. But as Netflix hoped, the movie recommendation system drop-in he didn’t exactly get the upgrade.

Anders Byland has a role at Netflix. The Motley Fool has a position on his Netflix and recommends Netflix. The Motley Fool has a disclosure policy.

Source link