Understanding how to find products similar to a given products using user recommendations.
- [Instructor] Search engines are a common way that users discover new websites. When a first time user visits your website from a search engine, you don't know enough about the user yet to make personalized recommendations, until that user enters some product reviews our recommendation system can't recommend them anything yet. In this case, we can show the user similar products to the one they're already looking at. The goal is to keep them on the website and get them to look at more products. You've probably seen this feature on online shopping websites where it says, if you like this product you might also like these other products.
With the product attributes calculated using matrix factorization, we can calculate product similarity. Let's look at find some more products dot PY. First, we'll load the movie ratings data set using panda's read CSV function. We'll also load the movie titles using read CSV into a data frame called movies DF. Then, we'll use the panda's pivot table function to create the ratings matrix and we'll use matrix factorization to calculate the U and M matrices. Right now, each movie is represented by one column in the in matrix.
First, let's use numpy's transpose function to flip flop the matrix so each column becomes a row. This just makes the data easier to work with, it doesn't change the data itself. The in matrix has 15 unique values for each movie that represent the characteristics of that movie. This means that other movies with nearly the same numbers should be very similar. To find other movies similar to this one, we just have to find the other movies whose numbers are closest to this movie's numbers. It's just a subtraction problem. Let's choose the main movie the user is looking at, let's pick movie ID five.
You could choose any other movie instead if you like. Now, let's look at the title and genre for movie ID five. We can do this by looking in the movies DF dataframe and using the panda's loc function to find the row by its index. And let's print out the title and genre for that movie. Next, let's grab the movie attributes for movie ID five from the in matrix. We have to subtract one here because M is zero indexed, but the movie IDs start at one. Now, let's print out those movie attributes so we can see what they look like, with these attributes we are ready to find similar movies.
Step one is to subtract this movie's attributes from every other movie's. This one line of code subtracts current movie features separately from every row of the in matrix. This gives us the difference in scores between the current movie and every other movie in the database. You could also use a four loop to do the subtraction one movie at a time, but using numpy we can do this in one line of code. Step two is to take the absolute value of the difference we calculated in step one, numpy's ABS function gives us the absolute value, this just makes sure that any negative numbers come out as positive.
Next, we'll add up the 15 separate attribute differences for each movie into one total difference score for each movie. The numpy's sum function will do this. We'll also pass in access equals one to tell numpy to sum up all the numbers in each row and produce a separate sum for each row. At this point, we are finished with the calculation. We just save the calculated scores back to the movie list so that we'll be able to print out the name of each movie. In step five, we sort the movie list by the difference score we calculated so that the least different movies show up first in the list.
Here pandas provides a convenient sort value function. And finally, in step six, we print out the first five movies in the sorted list. These are the movies that are most similar to the current movie. Okay, let's run the program. Right click and choose run. The movie the user is looking at is called The Big City Judge two, it's a legal drama and we can see the 15 attributes we calculated for this movie. Here are the five most similar movies that we found.
The first movie is the movie the user is already looking at. It makes sense that the movie would be most similar to itself, so we can ignore the first row. The next four movies are the ones that we would show to the user as similar projects. Based on their titles, these movies seem like they are probably very similar. They all seem to be movies about crime and investigation. The sequel, The Big City Judge three, is even in the list. That's a movie the user would probably also be interested in. You can change the movie ID and run the program again to see what's similar to a different movie.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received