Machine Learning “Advent Calendar” Day 17: Neural Network Regressors in Excel

Neural networks are often represented as black boxes.

Layers, activations, gradients, backpropagation… it can feel overwhelming, especially when it's all hidden behind model.fit().

build Create a neural network regressor from scratch using Excel. All calculations will be explicit. All intermediate values are displayed. Nothing is hidden.

By the end of this article, you will understand how neural networks perform regression, how forward propagation works, and how models can approximate nonlinear functions with just a few parameters.

If you haven't read my previous articles yet, please take a look at the implementation of linear regression and logistic regression first before starting.

It turns out that neural networks are not new objects. This is a natural extension of these models.

As always, follow these steps:

First, let's look at how the neural network regressor model works. For neural networks, this step is called forward propagation.
Next, train this function using gradient descent. This process is called backpropagation.

1. Forward propagation

In this part, we'll define the model, implement it in Excel, and see how the predictions work.

1.1 Simple dataset

I'll use a very simple dataset that I generated. It consists of just 12 observations and one feature.

As you can see, the target variable has a nonlinear relationship with x.

This dataset uses two neurons in the hidden layer.

1.2 Neural network structure

The neural network in this example includes:

One input layer with features x as input
One hidden layer and two neurons within that hidden layer. These two neurons allow us to create nonlinear relationships.
The output layer is just a linear regression

Here is a diagram representing this neural network and all the parameters that need to be estimated. There are a total of 7 parameters.

Hidden layer:

a11: weight from x to hidden neuron 1
b11: Hidden neuron 1 bias
a12: weight from x to hidden neuron 2
b12: Hidden neuron 2 bias

Output layer:

a2 1: Weight from hidden neuron 1 to output
a22 : Weight from hidden neuron 2 to output
b2: Output bias

At its core, a neural network is just a function. Composite function.

If you write it clearly, there's nothing strange about it.

We usually represent this function with a diagram of “neurons.”

In my opinion, the best way to interpret this diagram would be: Visual representation of synthesized mathematical functionsThis is not a claim that it literally reproduces the workings of biological neurons.

Why does this feature work?

Each sigmoid behaves like a smooth step.
Two sigmoids allow the model to increase, decrease, bend, and flatten the output curve.

By linearly combining them, the network can approximate a smooth nonlinear curve.

Therefore, in this dataset, Two neurons are enough. But can you find datasets for which this structure is not suitable?

1.3 Implementing functions in Excel

This section assumes that the seven coefficients have already been found. And we can implement the expression we saw earlier.

To visualize the neural network, you can use new continuous values x ranging from -2 to 2 in steps of 0.02.

This is a screenshot. You can see that the final function fits the shape of the input data very well.

2. Backpropagation (gradient descent)

At this point, the model is fully defined.

Since this is a regression problem, we use MSE (Mean Squared Error), similar to linear regression.

Now we need to find the following seven parameters: minimize MSE.

2.1 Details of the backpropagation algorithm

The principle is simple. But since there are many composite functions and many parameters, we need to organize the derivatives.

We do not explicitly derive all seven partial derivatives. I'll just tell you the results.

As you can see, there is an error term. So, to implement the whole process you need to follow the following loop:

Initialize the weights.
Compute the output (forward propagation).
Calculate the error,
Compute the slope using partial derivatives.
Update weights.
Repeat until convergence.

2.2 Initialization

First, set the input dataset to columnar format. This makes it easier to implement formulas in Excel.

In theory, you could start with a random value to initialize the value of a parameter. However, in practice, achieving complete convergence can require a large number of iterations. Also, the cost function is not convex, so it can fall into local minima.

Therefore, you need to choose your initial values ”wisely”. We have prepared some for you. You can make small changes and see what happens.

2.3 Forward propagation

The columns from AG to BP perform a forward propagation phase. First calculate A1 and A2, then calculate the output. These are the same expressions used earlier in forward propagation.

To simplify the calculations and make them more manageable, perform the calculations for each observation separately. This means that each hidden layer (A1 and A2) and the output layer have 12 columns. Instead of using the summation formula, calculate the value for each observation separately.

To facilitate the for loop process during the gradient descent phase, you can organize your training dataset into columns and extend the formula row by row in Excel.

2.4 Error and cost function

You can now calculate the value of the cost function from the BQ column to the CN column.

2.5 Partial derivatives

Compute the seven partial derivatives corresponding to the neural network weights. For each of these partial derivatives, we need to calculate the values for all 12 observations, resulting in a total of 84 columns. However, we have tried to simplify this process by organizing the sheets using color coding and formulas for ease of use.

So we start with an output layer with parameters a21, a22, b2. You can find it in columns from CO to DX.

Then for parameters a11 and a12 you can find columns DY to EV.

Finally, use columns EW to FT for bias parameters b11 and b12.

Finally, sum all partial derivatives across the 12 observations. These aggregated gradients are neatly arranged in columns From Z to AF. Parameter updates are performed on columns. From R to Xuse these values.

2.6 Visualizing convergence

To better understand the training process, use graphs to visualize how parameters change during gradient descent. At the same time, the decrease in the cost function is tracked. Y columnthe convergence of the model becomes clearly visible.

conclusion

Neural network regressors are not magic.

It is a composition of simple elementary functions, controlled by a certain number of parameters, and trained by minimizing a well-defined mathematical objective.

When you explicitly build a model in Excel, every step is visible. Forward propagation, error calculations, partial derivatives, and parameter updates are no longer abstract concepts, but concrete calculations that can be inspected and modified.

We now have a complete implementation of our neural network, from forward propagation to backpropagation. We recommend experimenting with changing the dataset, initial parameter values, and learning rate to observe how the model behaves during training.

Through this hands-on exercise, we saw how gradients drive learning, how parameters are iteratively updated, and how a neural network gradually shapes itself to data. This is exactly what happens inside modern machine learning libraries, just hidden behind a few lines of code.

All Excel files are available at this Kofi link. Your support means so much to me. Early supporters get the best deal as the price increases mid-month.