Learn about gradient descent and cost minimization.
- [Instructor] In the last video, we wrote a program that estimated the value of a house by multiplying each of its attributes by a fixed weight, and adding them up to get a total value. The function we wrote did the exact same math as this equation. The house's value = 50,000 + square feet times the first weight and bedrooms times the second weight. But how do we know what value to use for each of these weights so that the predictions generated by the function are accurate? The trick is to turn finding the weights into an optimization problem that the computer can solve on its own. When we are training our machine learning algorithm, we're really asking it to find the best weights that most closely reproduce the answers in the training data set.
First, we need some training data. Here's our data for three houses that we can use to train our algorithm. For each house, we have the number of bedrooms, the size in square feet, and the home's actual value. Let's write out our home value equation for each of these three known houses. I've just substituted in all the values we know from our training data. The only unknown values in each equation are the two weights. Our goal is to find the two weights that work as best as possible for all of these equations at the same time. We have to start somewhere. First, let's choose totally random guesses for each weight.
We'll set both to 10. Now we'll see how well that works by evaluating each equation. Let's substitute 10 in for each weight, and then evaluate each equation to get our initial price estimate. Looking at the real price for each house compared to what we calculated, we can see that our estimates are pretty far off with these weights. Let's quantify how bad the current estimates are. Let's subtract our prediction for each house from the real price of that house, and add them all up. Now, let's square each term in our equation. We square the error for each house to penalize large errors more.
We'd rather have each house's estimate be a little wrong than have one house be really wrong. Finally, let's divide this whole thing by the number of houses in our data set. We have three houses so we'll divide by three. It gives us an average squared error for a single house in the data set. By convention we're going to call this total error calculation our cost function. It tells us how wrong we are with the current weights, in other words, the total cost of the current model. Our goal is to find the weights that minimize the costs for all houses in our data set. If we can make the cost equal zero, then our prediction algorithm is perfect for every single house.
The higher the value of the cost function the more wrong our predictions will be. Let's rewrite the exact same cost equation in a more general way. This equation just says that the total cost is the summation of the squared difference between each guess and the real value of each house. Then the whole thing is divided by the number of houses. Now that we have a cost function to optimize, we can use a standard mathematical optimization algorithm to find the weights to give the lowest possible value for the function. A very common optimization algorithm that we can use to solve this problem is called gradient descent.
Gradient descent is an iterative optimization algorithm that we can use to find the best weights. It works by tweaking each of the weights a tiny bit at a time in the direction that will make the cost go down. Here's how it works. First we pass in the cost function, and the random starting weights to gradient descent. Then, by repeatedly tweaking the weights by small amounts, thousands of times, gradient descent will zero in on the values for the weights that get the cost function as close to zero as possible. And, finally, when gradient descent can't reduce the cost any more it will return the best weights it found.
In our case, it would return 92.1 and 10,001. Let's calculate the home prices again with the weights that gradient descent found. Sub in 92.1 and 10,001 as our weights. Now let's do the math to find our system's prediction for each house's price. We can see that all three predictions look really good. None of them are perfect, they're all really close to the real value of each respective house. Let's review how we found the best weights for our simple home value estimator. First, we created an equation to model the problem, estimating a home's value.
Second, we created the cost function to quantify the error in the model. Finally, we used the gradient descent optimization algorithm to find the model parameters that best minimized the cost function. The machine learning libraries we'll use in this course will handle all of these calculations for us, including running gradient descent. But it's important to have a basic idea of what's happening behind the scenes so that you can better understand what kinds of problems are possible to solve with machine learning.
- Setting up the development environment
- Building a simple home value estimator
- Finding the best weights automatically
- Working with large data sets efficiently
- Training a supervised machine learning model
- Exploring a home value data set
- Deciding how much data is needed
- Preparing the features
- Training the value estimator
- Measuring accuracy with mean absolute error
- Improving a system
- Using the machine learning model to make predictions