Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor

In this five-day machine learning “advent calendar,” we considered five models (or algorithms), all based on distance (local Euclidean distance, or global Mahalanobis distance).

It’s time to change your approach, right? We will discuss the concept of distance later.

Today we're going to talk about something completely different: decision trees.

Introduction with a simple dataset

Let's use a simple dataset with just one continuous feature.

As always, the idea is that you can visualize the results yourself. Now you have to figure out how to get your computer to do it.

Decision tree regression on a simple dataset in Excel (generated by myself) — Image by author

For the first split, you can visually infer that there are two possible values, one around 5.5 and one around 12.

Now, the question is which one to choose.

This is exactly what we are going to explore. How do we determine the value of the first split in our implementation in Excel?

Once you have determined the value for the first split, you can apply the same process to subsequent splits.

Therefore, Excel only implements the first split.

Decision tree regressor algorithm principles

I wrote an article called Always Differentiate the Three Steps of Machine Learning to Learn Machine Learning Effectively and Let's Apply Those Principles to Decision Tree Regressors.

So for the first time we have a “true” machine learning model with non-trivial steps in all three.

What is the model?

A model here is a set of rules for partitioning a dataset and assigning a value to each partition. which one? Average value y of all observations in the same group.

Therefore, a k-NN uses the mean of the nearest neighbors (observations that are similar in terms of a feature variable) to predict, whereas a decision tree regressor uses the mean of a group of observations (similar in terms of a feature variable) to make predictions.

The process of fitting or training a model

For decision trees, this process is also called growing the tree to full size. For decision tree regressors, the leaves contain only one observation, so the MSE is 0.

Growing a tree consists of recursively dividing the input data into smaller chunks or regions. You can calculate predictions for each region.

For regression, the prediction is the average of the target variable over the domain.

At each step in the construction process, the algorithm selects features and split values that maximize one criterion. For regressors, the mean squared error (MSE) between the actual value and the prediction is often used.

Tuning or pruning the model

For decision trees, the general term for model tuning is also known as pruning, which can be thought of as removing nodes and leaves from a fully grown tree.

It is also equivalent to saying that the construction process stops when criteria such as the maximum depth of each leaf node or the minimum number of samples are met. These are hyperparameters that can be optimized during the tuning process.

reasoning process

Once a decision tree regressor is constructed, it can be used to predict the target variable for new input instances by applying rules and traversing the tree from the root node to the leaf nodes corresponding to the input feature values.

The predicted target value of an input instance is the average of the target values of training samples that fall into the same leaf node.

Implementing the first split in Excel

Follow the steps below:

List all possible splits
Compute the MSE (Mean Squared Error) for each fold.
Select the split that minimizes the MSE as the best next split.

all possible divisions

First, we need to list all possible splits that are the average of two consecutive values. There is no need to test more values.

Decision tree regression in Excel with possible splits — Image by author

MSE calculation for each possible split

As a starting point, you can calculate the MSE before partitioning. This also means that the prediction is just the average value of y. And MSE is equal to the standard deviation of y.

The idea here is to find a split such that the MSE with the split is lower than before. It is possible that splitting does not significantly improve performance (or reduce MSE), in which case the final tree will be trivial, i.e., the mean value of y.

For each possible split, we can calculate the MSE (Mean Squared Error). The figure below shows the calculation for the first possible split (x = 2).

Decision tree regression in Excel MSE for all possible splits — Image by author

You can check the calculation details.

Split the dataset into two regions. For the value x=2, there are two possibilities x<2 または x>2 is determined, so the x-axis is divided into two parts.
Calculate predictions. Compute the mean of y for each part. That is the potential prediction of y.
Calculate the error. Next, compare the prediction to the actual value of y.
Calculate the squared error: You can calculate the squared error for each observation.

Decision tree regression in Excel using all possible folds — Image by author

optimal split

For each possible split, do the same to obtain the MSE. Excel allows you to copy and paste formulas. The only values that change are the divisible values of x.

Decision Tree Regression in Excel Split — Image by author

Next, if we plot the MSE on the Y-axis and the possible splits on the X-axis, we can see that there is a minimum value of MSE at x=5.5. This is exactly the result I got with my Python code.

Minimizing Decision Tree Regression MSE in Excel — Image by author

Exercises I want to try

Now you can play with Google Sheets.

As you change the dataset, MSE updates to show you the best cuts.
You can introduce category features
You can try looking for the next split
You can change the criteria instead of MSE. You can use absolute error, Poisson, or friedman_mse as shown in the DecisionTreeRegressor documentation.
You can change the target variable to a binary variable. Typically this would be a classification task, but since 0 or 1 are also numbers, the MSE criterion can still be applied. But if we want to create a good classifier, we need to apply the usual criteria Entroy or Gini. That's for the next article.

conclusion

Using Excel, you can implement a single split to gain more insight into how decision tree regressors work. Even if you didn't create a complete tree, it's still interesting because the most important part is finding the best split among all possible splits.

One more thing

Did you notice anything interesting about how features are handled between distance-based models and decision trees?

For distance-based models, everything must be numeric. Continuous features remain continuous, and categorical features must be converted to numbers. The model compares points in space, so everything must lie on the numerical axis.

Decision trees do the opposite. cut Divide features into groups. Continuous features become intervals. Category features remain categories.

Are there missing values? It's just another category. There is no need to assign it first. Trees, like other groups, can naturally handle this by sending all “missing” values down one branch.

Source link

Binance推荐代码 commented on Tell Us Your Thoughts on Saw X and The Creator: I don't think the title of your article matches th
binance Registrera dig commented on New Podcast Exploring A.I. and Business Travel: Thank you for your sharing. I am worried that I la
注册以获取100 USDT commented on Two divergent skills that matter in an AI world: Math and business development: Can you be more specific about the content of your
Linda Espey commented on Revolutionizing safety and seamless journeys: This was a fantastic and informative article! I re
skapa ett binance-konto commented on The humor of French slang: Thank you for your sharing. I am worried that I la

Machine Learning “Advent Calendar” Day 6: Decision Tree Regressor

Introduction with a simple dataset