Understand the cold start problem and how to address it with mean-normalized data.
- [Narrator] Recommendation systems work great when the user's already entered lots of reviews, but for first time users we don't know enough about the user yet to make personalized recommendations. There are three ways we can try to work around this problem. First, we could just not make any recommendations for new users. For some applications it might be okay to wait until the user's reviewed products before making recommendations. A second approach is to use product similarity to suggest similar products to users who haven't rated anything instead of making personalized recommendations. But a third option is to use the average rating of products to make recommendations.
In other words, we'll just recommend the products that have the best over-all ratings to new users. This can be helpful because some movies are just generally considered better than other movies. If a movie has a 5 star average rating across all users, that's probably a better movie to recommend to a brand new user than a movie that has a one star rating across all users. To take average ratings into account we just need a small tweak to our recommendation algorithm. Here's how that will work. Here we have five ratings for the same movie from five different users who've reviewed the movie.
First we'll calculate the average rating for the movie across all users. In this case, the average rating for the movie is 4.2 out of 5. Next we'll subtract the average rating from each user's rating. For user number one, instead of recording the rating as 4, we'll subtract 4.2 and record it as -.2. The idea is that this user rated the movie 0.2 stars under the average rating. These adjusted ratings are what we'll use to do matrix factorization and to make recommendations. Let's see how that changes things. Let's assume that our system predicts a rating of 0.8 for a specific user.
We know that the movie has an average rating of 4.2 so we just need to add back in the average to get the final rating for the user. So the predicted rating for this user is 5 stars. But the cool part is how this works out for brand new users. We can assume that brand new users who haven't reviewed anything yet will get a predicted rating of zero for every movie. But now we'll add back in the average rating and the predicted rating for the movie ends up being 4.2 stars. So even though this user hasn't reviewed anything yet, we can recommend this movie based on how popular it is with other users.
Let's open up train_recommender_cold_start.py and see how to do this in code. This file contains the code the factor our review data said. We read the data set using the read_csv function and then we create the ratings matrix using the pivot_table function. Now that we have a review matrix that covers every movie, we want to calculate the average rating of each movie. We can do this using the matrix_factorization_utilities.normalized_ratings function. This function takes in an array of ratings to average. So we'll pass in the ratings_df data set. We call the as_matrix function to make sure the ratings data frame is passed in as a NumPy array data type.
This function also returns two results. First it returns the means, or average ratings for each movie. And second it returns a new copy of the ratings matrix called normalized_ratings. This copy has the average rating subtracted from every user review. Next we factor the matrix to create U and M, and then multiply U and M to get the predicted ratings. Then here, after we predict ratings for all users, we need to add back in the average rating for each movie. Finally at the bottom, we use the pickle.dump function to save a copy of the means to a file called means.dat.
Let's run the program. Right-click, choose Run, and we can see it created the file here. Now let's switch over to cold_start_recommendations.py our goal in this file is to recommend movies to a brand new user. First we use pickle.load to load the means.dat file. Then we load the movie list csv file so we can print out movie titles. Next we use the mean ratings as the user's predicted ratings. And then finally we recommend the movies to the user by returning the movies in order of their average rating.
Let's run it and see the result. Right-click, choose run. The user's recommended the top five hightest average rated products we have. Always recommending the highest rated movies to new users might not be perfect, but it's a good place to start until the user reviews some products.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received