Learn how to apply feature engineering techniques to real data.
- [Instructor] Now we're ready to code our machine learning system. Let's open up train_model pt1.py. On the first line, we've loaded the dataset into a pandas dataframe using the read_csv function. Now that the data is loaded, our first task is to do any feature engineering work that's required. There are four address fields that we want to remove from our dataset because they weren't useful to us. The house number, unit number, street name, and zip code. We can do that by using the D-E-L command to delete the columns from our data table.
First let's delete the house number and the unit number and the street name and zip code. We also need to apply one hot encoding to two columns that contain Cavic Oracle data, the garage type and city fields. Luckily, pandas provides a get_dummies function that performs one hot encoding. All we have to do is tell it which fields to encode. So here we'll call the get_dummies function. We'll pass in the dataframe name and the columns to encode.
Next, we'll delete the sale price column from our features because we don't want to let the machine learning model see the sale price in the input data. We can do that with the D-E-L command as well. And finally, we'll create the x and y arrays. Remember, x is the standard name for the input features and y is the standard name for the expected output to predict. The x array will be the contents of our features dataframe. The only difference is that we'll call the as_matrix function to make sure the data is a NumPy matrix data type and not a pandas dataframe.
And the y array will be the sales price column from our original dataset. We'll also use the as_matrix function to convert this to a NumPy matrix data type. Great! We now have our features ready for our machine learning system to use.
- Setting up the development environment
- Building a simple home value estimator
- Finding the best weights automatically
- Working with large data sets efficiently
- Training a supervised machine learning model
- Exploring a home value data set
- Deciding how much data is needed
- Preparing the features
- Training the value estimator
- Measuring accuracy with mean absolute error
- Improving a system
- Using the machine learning model to make predictions