Learn how to generate a linear model that predicts the result from a set of inputs with the least possible error.
- [Instructor] When you analyze data, you will often have a set of inputs or independent variables that combine to produce an output or dependent variable. The latter values are called dependent variables because they depend on the inputs for their value. In this movie, I'll show you how to generate a linear model that predicts the results from a set of inputs with the least possible error. My sample file is the Linear.nb notebook. You can find it in the Chapter03 folder of the exercise files collection. The data that I'm going to assign to my clist variable are value pairs.
The first value in each pair is the number of square feet for a house, and the first one it is 1,400, and the second is the sale price of the house... for the first pair that's 225,000. Before I start working with my data, I need to evaluate the notebook. So I'll go to the Evaluation menu and click Evaluate Notebook. Now my list has been assigned to the clist variable. I want to create a linear model and that will take the form of A times X plus B equals Y.
So sum multiplier A will be multiplied by the first value, in this case 1,400, plus B, which is another constant value, will equal the output Y. The values won't be exact. That's because there is some variation in the relationship between square footage of a house and the sale price, but the linear value should give us a good indication or good prediction of what should occur based on a given square footage. I'll create my linear model.
I'll assign it to a variable named lm, then equal sign. The function or keyword I'll use is LinearModelFit. There we go. Then a left square bracket. And I want to use my data variable, so that's clist, comma. And then x and x. So, what I'm doing is saying take a look at the input variable, which is the square footage of the house, and use it for the criteria.
I'll type a right square bracket. Everything looks good. And Shift + Enter. And I get my model as an output. The model says that there is a base price of 4,776.17. In other words, if a house had zero square feet, then that's what you would expect to have the value of the land be, I guess. And then, the x value, which is the square footage of the house, is multiplied by 141.473.
And when you add 4,776.17 to the value of the multiplication, then you get a predicted value for the house. So let's see how that works. I will type lm, which again is the variable that I assigned the model to. And then I'll give it a square footage to estimate. So I'll say 2,000. Right square bracket to close and Shift + Enter. So I have a predicted value of the house of 287,723.
And based on the admittedly limited dataset that I have, that seems like a reasonable value. Now to check the model, let's put in an existing value and see what the model predicts. Let's go with a 1,900 square foot house. So I'll type in lm, left square bracket, 1,900. Right square bracket and Shift + Enter. And we get a value of 273,576, which is substantially higher than the actual value of 235,000.
And there can be a lot of reasons for that error. The first and most likely is that we don't have enough data to make a really educated guess. And second, we also don't have any indication of the condition of the house or of the location. All we're looking at is a simple one-to-one relationship, square footage versus sale price. When you're working with that simple of a model, a substantial amount of error is not surprising.
- Separating training data from test data
- Importing data from a file
- Preparing data for machine learning
- Grouping and sorting elements using a rule
- Determining functions that generate data
- Finding a fit using a linear model
- Performing supervised learning tasks
- Classifying items using training data
- Identifying data clusters