Join Lillian Pierson, P.E. for an in-depth discussion in this video Content-based recommender systems, part of Introduction to Python Recommendation Systems for Machine Learning.
- [Instructor] The last type of recommender I want to cover is content-based recommendation systems. These type of recommenders are not collaborative filtering systems because user preferences and attitudes do not weigh into the evaluation. Instead, content-based recommenders recommend an item based on its features and how similar those are to features of other items in a dataset. In the demo, we're going to use the nearest neighbor algorithm to build a content-based recommender. The nearest neighbor algorithm is an unsupervised classifier.
It's also known as a memory-based system, because it memorizes instances and then recommends an item, or a single instance, based on how quantitatively similar it is to a new incoming instance. To conceptualize how to use nearest neighbor algorithm in this capacity, imagine you're a car dealer. You get a customer that comes in and tells you that he wants a car that gets 25 miles per gallon and has a 4.7 liter, 425 horsepower engine.
You have a dataset that describes all of these specs on the cars in your inventory. So you could use the nearest neighbor algorithm to identify the single best matching car from among all cars in your inventory database. To do this, you'd create a single test point that represents your customer's desired specifications. When you run the model, it will memorize all of the data points in your inventory. It will then calculate the single data point that is quantitatively most similar to your test point.
The car that this data point represents would be the one you'd recommend to your customer. Let me show you in the coding demonstration. Let's make a content-based recommender in Python. We'll use the nearest neighbor algorithm in this example. The first thing we need to do, as usual, is import our libraries, so we'll import numpy as np and import pandas as pp. Then we want to bring in scikit learn, so we'll say import sklearn, and then from sklearn.neighbors we'll import nearest neighbors.
When I run this then we have the libraries for the analysis, and in this analysis we're going to use the mtcars dataset, so you can see this citation here, but it's going to come with a download for the course. As an example of content-based recommenders, we're going to recommend an item based on its features and how similar they are to features of other items in the dataset. We're going to use the nearest neighbor algorithm for this. So imagine you're an antique car dealer.
You sell cars from the 70's. You have 32 cars on your lot, and you keep a small database to document key characteristics of each. Let's bring the dataset in, we'll call it cars, and we'll use the read csv function. So pd.read_csv, and then we want to bring in mtcars.csv, and then let's name the columns of this dataframe.
So we're going to say cars.columns, and then create a list with those names. The first name will be car_names, the second column is going to be mpg, the third column is going to be cyl for cylinders, and then disp, hp, drat, weight, qsec, vs, am, gear, and lastly, carb, which stands for carburetors.
Alright, now that we've named the columns let's take a look at the dataset. We'll call the head method off of the cars dataframe, and we have a small sample here, and it looks pretty good. So imagine that a customer walks in and tells you that he's looking for a car that weighs 3.2 tons, gets at least 15 miles per gallon, has an engine with a displacement size of 300 cubic inches, and a power of 160 horsepower.
Let's make a test point to represent the shopper's specifications. So I say, we'll call it t, and we'll set it equal to a list, and we want 15 for 15 miles per gallon, 300 for 300 cubic inches. The next value will be 160 for 160 horsepower. And lastly we'll have 3.2, which is for 3.2 tons.
Now let's see how the nearest neighbor algorithm can be used to recommend a car for this shopper based on his requirements. The first thing we need to do is define our dataset. So we'll say x is equal to cars.ix, and we use a special indexer to select only the variables we need here. The first variable here is miles per gallon, and cubic inches is a measure of displacement, so we'll select the variable at position three.
The next thing is we need 160 horsepower, so that's hp, variable four. And lastly we're looking at the weight, how many tons, so that's variable six. And then of these we just want the values. So let's say .values. And then let's look at the first five records here. So we'll select and return only those.
Run this, and here we have a subset that we've created from our original cars dataframe. Now let's build the nearest neighbor model. We'll start by instantiating a nearest neighbor object by calling nearest neighbor function and passing in n_neighbors equal to one. This tells the algorithm to search the dataset and find a single point p that is nearest to the test point t. So let's start by first saying nearest neighbors, call the nearest neighbors function, and then we pass in the parameter n_neighbors equal to one.
And then we need to fit the model to the data, so we say .fit, pass in our x dataset, and we'll call this object nbrs, for neighbors, and we can run this whole thing. To find the nearest neighbor, we'll call the k neighbors function on the test point, and we can just print out that result. So what I'm going to do is I'm going to access the k neighbors function from within our nearest neighbor model.
So I'm going to say nbrs, and then .kneighbors, and then we just need to pass in our object t, which is our test point. And we'll print this whole thing out. The k neighbors function returns here an array that represents the length to point p from the test point t, and here, another array that contains the index value of the nearest point or the most similar instance in dataset x.
So that's going to be the record that's indexed at position 22. So let's look at our cars dataset. Just print this out and see what car that is. So according to our nearest neighbor model, you should recommend the shopper to take a closer look at this AMC Javelin car, because it's the most similar car to the shopper's specifications of all cars that the car dealer has on his lot.
- Working with recommendation systems
- Evaluating similarity based on correlation
- Building a popularity-based recommender
- Classification-based recommendations
- Making a collaborative filtering system
- Content-based recommender systems
- Evaluating recommenders