Join Lillian Pierson, P.E. for an in-depth discussion in this video Popularity-based recommenders, part of Building a Recommendation System with Python Machine Learning & AI.
- [Narrator] Turning now to popularity-based recommendation systems. Popularity-based recommenders offer a very primitive form of collaborative filtering, where items are recommended to users based on how popular those items are among other users. So in the drawing here, Place represents item we are recommending, and we are going to take a count of the number of ratings that were given to eat Place. The assumption is that the places that have the most number of ratings or reviews are the most popular. Hence, we make the popularity-based recommendation that Place 1 is preferable to users over Place 2.
You can see the logic more clearly here. Based on the number of users or guests that rated Place 1 and Place 2, we'd say that Place 1 is more popular than Place 2. So, based on popularity, Place 1 would be recommended over Place 2. Let's look at the types of data that popularity recommenders use. Popularity-based recommenders rely on purchase history. They're often used by news sites, like Bloomberg or New York Times. In this environment, the data that the recommender works off of is not purchase history per se, but rather, it's a website user activity data set.
So, recommendations are made based on counts of most read or most shared articles. One drawback of popularity-based recommenders is that they can't make recommendations that are personalized to users. That's because they don't take user data into account. The first type of recommender system that I'm going to show you is the popularity-based recommender. Popularity-based recommendation systems offer a very simple form of collaborative filtering, where items are recommended based on how popular they appear to be among users.
The first thing we need to do is import our libraries, so we'll import Pandas as PD and import Numpy as NP. We run this and we've got our libraries. Now, the data set we're going to use for this demonstration, it is hosted at UC Irvine, but it was originally published by a university, in a workshop on context aware recommender systems. Here's the citation from the source of this data set.
We're going to first read in our data sets. We'll call the first data frame, frame, and we'll say pd.read_csv, call that function and pass in a string with the name of our file. That's rating_final.csv. Our second data frame's going to be called cuisine and we'll use the Read CSV function to bring that data set in, as well.
This one has kind of a funny name, it's chefmozcuisine.csv... And we run that. Now, let's take a look at these data sets real quick. We'll look at the frame data set first and we'll just look at the head. In this demonstration, users are restaurant reviewers and the items are restaurants or, as they're called in the data set, Places.
Each place gets a rating of zero, one or two, where two is the best and zero is the worst rating. And as you can see here, the frame data frame provides a record for each place that each user has reviewed, as well as the review scores. You can see that there are multiple records per user ID. You'll see here that user 1077 has four entries. That's because the data set provides a record for each place that each user has reviewed, as well as their review scores.
Let's also look at the cuisine data set. As you can see, this data set just identifies each place by the place ID and then it lists the type of cuisine that is served at each place. Now, let's look at how to make recommendations based on simple counting. To find the place that's most popular, we'll do that by counting up the number of ratings each place has gotten and converting that array to a data frame. So, to do that, we're going to say frame.groupby.
We're going to group frame by the place ID and for each unique place ID, we want to look at the ratings column and take account of how many ratings there are. Because we want this to be its own data frame, we're going to use Panda's data frame generator, so it's pd.DataFrame. This function just converts the output of our group by function into its own data frame.
Let's call this new data frame, rating_count. Let's also sort the places in descending order, according to the number of reviews they received. To do that, we just take the rating_count data frame and we want to call this sort values method and we pass in rating, because we want it to sort by the rating, and we want it to sort in descending value, so we pass in the argument, ascending=False.
Let's just look at the first few records. Now, it looks like the most reviewed place is a place with a ID number 135085 and it's got a total of 36 ratings. Now, let's take the top five most often rated places and see if they have any similarities between the cuisines that they serve. To do that, we'll first make a data frame of the place IDs of the most often rated places, then we'll merge that data frame with the cuisine data frame.
So let's create the data frame. We're just going to name the place IDs for each of the most reviewed places in the data set, so that's 135... 085... 132825... 135032... 135052...
132834. We're just going to set the index to a series of numbers between zero an six, so we say index=np.arrange. We pass the number five. Then, let's just name the column, so we'll say columns=placeID. Now, let's call this whole things most_rated_places...
And then we want to merge this data set, most_rated_places, with the cuisines data set and see if there're any similarities between the cuisines that are served at the most popular places in town. So to do that, we're going to use the panda's merge function and we're going to say that, on the left, we want most_rated_places and then on the right, we want the cuisine and we want it to be merged on the field called placeID.
Now, let's call the output of this, Summary, and then print it out. So what we have here is a list of the most popular places in town and the types of cuisines that are served at each of them. Let's see how many types of cuisines are available from places in this data set, in total. To do that, we just say cuisine and then we select the Rcuisine variable and pull the describe method off of it.
So what you can see here is that there are 59 unique types of cuisines that are represented in our data. Also notice that the most frequently occurring type of cuisine in the data set is Mexican food. Now, let's look back at our summary table. You can see that two of the top rated places in town both serve Mexican food. The recommender is suggesting that Mexican food is popular and that places that serve it are good candidates for recommending.
From the description of our cuisine data frame, we see that Mexican food is the most frequently served type of cuisine in the data set. Our recommender is basically saying that places that serve the most popular types of cuisine are more likely to be appreciated by the average restaurant goer in the city. Makes sense, right?
- Working with recommendation systems
- Evaluating similarity based on correlation
- Building a popularity-based recommender
- Classification-based recommendations
- Making a collaborative filtering system
- Content-based recommender systems
- Evaluating recommenders