The heart of every machine learning algorithm | Written by Mikko Lesmana

In general, there are three main drivers when it comes to machine learning. it is, A dataset, a cost function, and a “tuner”.

data set It is by itself the simplest of the three and you probably already know what it is. This is the set of data needed to train the algorithm. This is the simplest one, but it's also the one you'll spend the most money on as a data scientist.

That's the idea of taking the trash out of the trash. If you feed an algorithm with dirty, garbage, and useless data, ML can become as good as a bird trying to predict the stock market.

80% of the total time is spent preparing the dataset. Some data sets may be streams of data, or may already be bundled in .csv format. Designing data pipelines is an art in itself. Do you want to manually clean up null values? Or do you manually preprocess (encode or label) the data? I mean, are you okay with 100 rows, 1,000,000 rows? Yes, not so much.

cost function

The cost function is the goal that ML tries to minimize, which is also why it is called the loss function or error function. A simple example of a cost function is the mean squared error (MSE) in linear regression. We calculated the squared error between the predicted and actual values.

You may think, “I want to minimize the error as much as possible,'' but the so-called Variance and bias. These things are like pulling each other together. If one dataset has a very low error predictor, but the next dataset has a very large error, the algorithm would be: overfitted, In other words, it has high variance and low bias.and vice versa Lack of conformity, low variance, high bias. Although the predictor is highly generalized, it performs poorly across datasets.

Tuner (optimizer)

I like to call these tuners, but the proper term is optimizer. These are “THE MACHINE” in machine learning. The tuner is responsible for reducing the error in the cost function. When it comes to machine learning, the unspoken rule is to mostly deal with numerical analysis. At each time step, the algorithm attempts to adjust and recompute the error.

It is one of the popular optimizers, and rightly so. gradient descent, Although simple, it can cover a large number of loss functions (as long as they are convex).

The way to reduce error in gradient descent is to descend down the gradient (∇). A gradient is a dimensionally generalized form of the derivative of a function. If you look at the parabolic function below that maps the linear regression intercept to the squared error, you will see that the minimum error is at the minimum value. If you remember your first calculus, you can find the minimum value by deriving the function and finding where it reaches 0. That is, df/dx = 0 or ∇f = 0.

In this example, we know where the minimum is, but the algorithm cannot “see” the local minimum. This is why we move numerically step by step to find the minimum value. This article does not intend to explain in detail how graduate descent works. Check out Khan Academy https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/optimizing-multivariable-functions/.What is gradient descent?

or StatQuest (highly recommended channel)

That's the end of the article, thank you for reading

Source link