Understand matrix completion using dot products.
- [Instructor] Let's write the main code for our recommendation system. Open up factor_review_matrix.py. First, I'll load up the review dataset into a data frame called raw_dataset_df by using pandas read_csv function. Then we use the pandas pivot table function to build the review matrix. At this point, ratings_df contains a sparse array of reviews. Next, we want to factor the array to find the user attributes matrix and the movie attributes matrix that we can multiply back together to recreate the ratings data.
To do this, we'll use the low rank matrix factorization algorithm. I've included an implementation of this in matrix_factorization_utilities.py. We'll talk more about how it works in the next video, but let's go ahead and use it. First, we pass in the ratings data, but we'll call pandas as matrix function to make sure we pass then as a numpy matrix data type. Next, this method takes in a parameter called num_features. Num_features controls how many latent features to generate for each user and each movie.
We'll pass in 15 as a starting point. This function also takes in a regularization amount. Let's pass in 0.1 for now. We'll discuss how to tune this parameter in a later video. The result of the function is a U matrix and an N matrix that has 15 attributes for each user and each movie respectively. Now, we can get the ratings for every movie by multiplying U and N together. But instead of using the regular multiplication operator, we'll use numpy's matmul function so it knows we want to do matrix multiplication.
The result is stored in an array called predicted_ratings. Finally, let's save the predicted_ratings to a csv file. First, we'll create a new pandas data frame to hold the data. For this data frame, we'll tell pandas to use the same row and column names as we have in the ratings_df data frame. Then we'll use pandas to csv function to save the data to a file. All right, let's run the program. Right click, choose Run. All right, it created a new file called predicted_ratings.csv. We can open that file with any spreadsheet application.
I've already opened the file here in Numbers. This data looks just like our original review data, except now every cell is filled in. We now have an estimate for how many every single user would rate every single movie. For example, we can see with user three rating user four that they would give it a rating of about four stars. Now that we know all these ratings, we can start recommending movies to users in the order of their score. Let's look at user number one and see which movie we'd recommend to them. Of all these movies, if we exclude the ones the user had previously rated, movie number 34 way on the right is the one with the highest score so that's the first movie we should recommend to this user.
When the user watches this movie, we'll ask them to rate it. If their rating disagrees with what we predicted, we'll add that new rating in and recalculate this matrix. That will help us improve our overall ratings. The more ratings we have the work from, the less holes we'll have in our ratings array and the better chance we'll have of coming up with accurate values for the U and N matrices.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received