Understanding how to find similar products in a real program
- [Instructor] In the previous video, we created these four data files. If you like to create them now, run train_recommender_cold_start final.py before you continue. Now let's open up product_similarity_from_data_files.py. When you show related products in the real application, you don't want to have refactor the matrix each time because it's too slow. Instead, you can use the product_features.dat file to calculate product similarity quickly. First we'll load the product_features.dat file using Python's pickle.load function.
The M matrix that we just loaded has one column for each movie. Let's transpose the matrix so each column becomes a row. This just makes the data easier to work with, but it doesn't change the data. Next, we'll load the movie list using read_csv so we'll have access to the movie titles. And we'll pick a movie to find similar movies to. I've chosen movie_id = 5. Next we'll look up this movie in the movies_df dataframe and then we'll print out the movie's title and genre. Now we're ready to calculate movie similarity. The first step is to subtract this movie's features from every other movie's features.
Next we'll take the absolute value of that difference to make sure all the numbers are positive. Then we sum the separate feature differences foreach movie into one total different score for that movie. And then we save those different scores to the movie's df dataframe. And then sort the movie list so the least different movies are first in the list. Finally we can print out the first five movies in the list. Let's run this. Right click, choose Run. Okay, our movie is called The Big City Judge 2. The first movie in this list is the movie itself. That's because a movie is most similar to itself.
Let's ignore that one. The other four movies look pretty similar to our movie. They all look like crime or legal dramas. We even have the sequel, The Big City Judge 3, in the list. But notice that this ran nearly instantaneously because all the hard work of calculating movie features was done ahead of time. Now that we have the features available in a data file, we can use them to find similar products on the fly without any delays for the user.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received