New book unpacks the models, languages and systems that drive technology

How do patterns become predictions and data turn into decisions? Just as we learn from experience, AI models learn from examples. The main difference is that while only a few examples are needed to understand a concept, these models often require thousands or even millions of examples to effectively learn. why?

AI models use mathematics to build an understanding of the world. The examples we provide will help you discover patterns, understand data, find hidden relationships, predict future outcomes, and make informed decisions. Without these models, we would have no way to connect the dots and would be drowning in numbers, words, and images.

Here we will consider various algorithms, starting with the simplest classical linear model. These models learn to draw lines through data points, just like you did in high school math class. However, these lines are automatically detected from the example. From there, we discover decision trees that make yes/no choices and clustering algorithms that learn to group similar things together. If these terms sound complicated, don’t worry. We’ll explain in detail how each model learns and why it’s useful.

You’ll quickly gain a solid understanding of these models, their applications, and how they bring you one step closer to leveraging AI. Despite the excitement about deep learning and large-scale language models, these basic algorithms remain invaluable. They can be trained faster, are easier to interpret, and are more efficient for many everyday problems, such as predicting sales or detecting suspicious transactions. After all, you don’t need a flamethrower to light a candle. So, let’s roll up our sleeves and dive into the fascinating world of machine learning models.

linear model

Imagine you are a pirate with a treasure map studded with dots. Each dot shows where a previous treasure hunter found gold. Some found more, others less. Your job is to find patterns where treasures tend to be found. By drawing lines at these points, you can predict how much gold is buried anywhere on the map, even where no one has ever mined before. Congratulations! I visualized a simple linear model.

In the context of AI, we are looking for different kinds of patterns. Dots are data points, and rather than connecting them like a path, you draw lines that best represent the overall trend. This line is how the model understands the relationships in the data. Just as a “treasure line” can help predict the value of a treasure based on its location, a linear model uses the line to predict new cases that have not yet been seen. These models look for patterns that can be described by straight lines or their higher dimensional cousins. You input information (called the independent variable), and the model uses that line to make predictions (called the dependent variable). This simple idea is surprisingly powerful. This line acts as a mathematical summary of the relationship between input and output. To understand how these models automatically find the “best fit” line, let’s look at the first algorithm: linear regression.

linear regression

Imagine you are a real estate agent and you need to predict the sale price of a home. 2 You probably know that larger houses are generally more expensive than smaller ones. But how much longer? There are many facts and figures about the home, including square footage, number of rooms, amenities, neighborhood, and more. Instead of guessing, look at recent home sales in your area and try to find patterns between these features and prices. This is exactly what linear regression helps. Each of these inputs, such as size and square footage, is an independent variable. The dependent variable will be the sale price of the house. Because this is what you are trying to predict based on the characteristics of the house.

Before we get into the main topic, let’s understand what “linear” means. A linear relationship is one in which a change in one variable leads to a proportional change in another variable. For example, if you purchase tomatoes for $10 per pound, each additional pound increases your total cost by exactly $10. This creates a straight line pattern when graphed. However, not all relationships are linear. For example, if you double the length of the sides of a square instead of doubling it, the area quadruples. Therefore, linear regression works best when there is considered to be an approximately linear relationship between the variables.

How does it work? Imagine that each house is a point on a graph, with square footage on one axis and price on the other. Our goal is to find the line that best represents this data. This line has two important components: bias (also known as intercept) and slope. The intercept tells you where the line starts. Think of it as the base price you would expect for a home, regardless of its features. The slope tells you how much the price changes as you increase the square footage. These are collectively called coefficients, and finding the right values is the key to making good predictions.

If house prices were perfectly linear with size, all these points would lie exactly on a straight line. But in the real world, data is messy. Two homes of the same size may sell for different prices due to location, upgrades, market timing, and more.

This is where the “regression” part of linear regression comes into play. Rather than trying to connect all the dots perfectly, look for the line that best captures the overall trend. But what makes a line optimal? For each home in your data, you can measure how far the line’s predicted price is from its actual price. These differences are called residuals. A positive residual means that the price predicted by the line is too low. A negative residual means the guess is too high.

It is possible to minimize these residuals, but there is a catch. If we simply sum all the residuals, the positive and negative values cancel each other out, fooling us into thinking we have a good line when in fact we don’t. Instead, square each residual before adding. This approach, called “least squares,” has two important effects. This ensures that all residuals are positive (that is, they no longer cancel) and penalizes large errors more than small errors. After all, being $100,000 off in your home price prediction is worse than being $10,000 off.

Therefore, there may be different lines through which the point can pass, but only one of them will minimize the overall error in the residual. That is the solution obtained using the least squares method. There are various ways to find the optimal coefficients and bias terms. Least squares is one approach. Gradient descent, another important method that can be used for this task, is discussed in the next chapter. Both methods aim to minimize prediction errors, just in different ways.

In reality, predicting home prices is about more than just square footage. You may also consider the number of bedrooms, age of the home, location, and many other factors. Each element becomes a separate dimension in the analysis. Although these higher dimensions cannot be easily visualized, the same principles apply. We are still looking for the best way to put all these characteristics together to predict prices.

Once you find the optimal value, you can use it to make predictions. For a new home, plug its characteristics (size, number of rooms, etc.) into the equation to predict the price.

But how do we know that our model is good? To understand whether the predictions are reliable, we need a way to measure the model’s performance. This is where evaluation metrics come into play.