Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

Today's model is logistic regression.

If you already know this model, you have the following questions:

Is it logistic regression? regressor or classifier?

Now, the question is just like this: What is a tomato? fruit or vegetables?

From a botanist's point of view, a tomato is a fruit because we observe its structure: seeds, flowers, plant biology, etc.

From a cook's point of view, tomatoes are vegetables because we focus on their taste, how they are used in recipes, and whether they are salads or desserts.

Since there are two valid answers for the same object, perspective is different.

Logistic regression is just that.

In SStatistics / GLM From a perspective, it's a regression. And anyway, there is no concept of “classification” in this framework. Examples include gamma regression, logistic regression, and Poisson regression.
in machine learning Used to classify from a perspective. Therefore, it is a classifier.

What is a weight-based model classifier?

Therefore, y can be 0 or 1.

0 or 1, those are numbers, right?

Therefore, we can think of y as continuous.

Yes, y = ax + b, y = 0 or 1.

Why not?

Now, you may be wondering, why am I asking this question now? Why hasn't this been asked before?

For distance-based and tree-based models, categorical y is truly categorical.

If y is categorical, then red, blue, greenor simply 0 and 1:

in K-NNLook and classify. neighbors in each class.
in center of gravity modelCompare with Center of gravity for each class.
in decision treeyou calculate class percentage At each node.

In all these models:

Class labels are not numbers.
They are categories.
The algorithm never treats them as values.

Therefore, classification is natural and immediate.

However, weight-based models behave differently.

Weight-based models always do the following calculations:

y = ax + b

Or you can use more complex functions with coefficients later.

This means:

This model can handle numbers everywhere.

The key ideas are:

This same model can be used for binary classification if the model performs regression.

Yes, you can use linear regression for binary classification.

binary label is 0 and 1they are already numbers.

And in this special case we have Ordinary least squares (OLS) can be applied directly to y = 0 and y = 1.

The model can be fitted to a straight line and use the same closed-form formula, as shown below.

Logistic Regression in Excel – All images by author

The same gradient descent method can also be performed and works perfectly.

And to get the final class prediction, simply threshold.
Typically 0.5 (or 50%), but you can choose another value depending on how exact you want to be.

If predicted y≥0.5, predict class 1.
class 0 otherwise

This is a classifier.

You can also find the point where y=0.5 because the model produces numerical output.

This value of x is decision making frontier.

In the previous example, this would occur at x=9.
With this threshold, we have already seen: 1 misclassification.

But as soon as we introduce points, problems arise. big value of x.

For example, suppose you want to add a point with x= 50 and y = 1.

Linear regression is straight line Through all the data, this one large x value pulls the line up.
The decision frontier shifts from x= to approximately. x=12.

And now with this new boundary we have: 2 misclassifications.

This shows the main problem.

Linear regression used as a classifier is very sensitive to extreme values of x. The decision frontier moves dramatically and classification becomes unstable.

This is one reason why we need a model that does not behave linearly forever. A model that stays between 0 and 1 even when x becomes very large.

And this is exactly what we get with the logistic function.

How Logistic Regression Works

As with linear regression, we start with ax + b.

Next, we apply one function called the sigmoid or logistic function.

As you can see in the screenshot below, the value of p is between 0 and 1, so this is perfect.

p(x) is Predicted probability that y = 1
1 − p(x) is the predicted probability of y = 0

Regarding classification, we can simply say:

if p(x) ≥ 0.5predict the class 1
Otherwise predict the class 0

From probability to log loss

Currently, OLS linear regression attempts to minimize the MSE (Mean Squared Error).

For binary target logistic regression, Bernoulli likelihood. About each observation i:

if yᵢ = 1the probability of a data point is: pᵢ
if yᵢ = 0the probability of a data point is: 1 − pᵢ

For the entire data set, the likelihood is the product of the whole. i. In practice, we take the logarithm and convert the product to a sum.

in GLM perspectivewe try maximize This log-likelihood.

in Machine learning perspectiveDefine. loss as negative log likelihood and us minimize that. This makes the normal logros.

And it's equivalent. There will be no demonstration here.

Gradient descent for logistic regression

principle

As with linear regression, you can also use: gradient descent here. The idea is always the same.

Start with some initial values of a and b.
Calculate losses and their losses gradient Regarding (derivatives) a and b.
move a and b just a little in that direction reduce loss.
repeat.

There's nothing mysterious about it.
It's the same mechanical process as before.

Step 1. Calculate the slope

For logistic regression, average log loss It follows a very simple structure.

This is simply average residual.

Below are the results of the formulas that can be implemented in Excel. As you can see, the log loss formula may seem complicated at first glance, but it is ultimately very simple.

Excel makes it easy to calculate these two quantities. SUMPRODUCT Mathematical formula.

Step 2. Update parameters

Once we know the slope, we update the parameters.

This update step is repeated for each iteration.
Then, with each iteration, the loss decreases and the parameters converge to their optimal values.

The big picture is now visible.
We've looked at updating models, losses, gradients, and parameters.
Viewing details for each iteration in Excel actually allows you to: play with models: Change the values, observe the movement of the curve, and watch the loss decrease step by step.

It's incredibly satisfying to see how everything fits together so clearly.

What about multi-class classification?

For distance-based and tree-based models:

No problem at all.
It does not interpret labels as numbers, so it naturally handles multiple classes.

But what about weight-based models?

Now there's a problem.

If you write the numbers for the class, it will be 1, 2, 3, etc.

The model then interprets these numbers as a real value.
This causes a problem:

The model considers class 3 to be “bigger” than class 1.
The midpoint between class 1 and class 3 is class 2
The distance between classes becomes meaningful

However, none of this is true for classification.

So:

For weight-based models, you cannot use only y = 1, 2, 3 for multiclass classification.

This encoding is incorrect.

We'll show you how to fix this later.

conclusion

Starting with a simple binary dataset, we saw how weight-based models work as classifiers, why linear regression quickly reaches its limits, and how a logistic function solves these problems by keeping predictions between 0 and 1.

Then, expressing the model through likelihood and log loss resulted in a mathematically sound and easy-to-implement formulation.
With everything in Excel, you can visualize the entire learning process: probabilities, losses, gradients, updates, and finally parameter convergence.

Detailed repeater tables actually allow you to: look How the model is improved incrementally.
You can change values, adjust the learning rate, add points, and instantly observe how the curve and loss react.
This is the real value of doing machine learning in spreadsheets. Nothing is hidden and all calculations are transparent.

Building a logistic regression in this way not only helps you understand the model; why It's trained.
And this intuition remains as we move into more advanced models later in the advent calendar.

Source link