Understand that user attributes can be useful even if you don't know their meaning as long as we can discover them.
- [Instructor] We can estimate how much a user will like a movie by assigning attributes to each user and each movie and then multiplying them together and adding up the results. The same calculation can be represented as a matrix multiplication problem. First, we put the user attributes in the matrix called U, in this case, five, negative two, one, negative five, and five. And then, we put the movie attributes in the matrix called M, and we use matrix multiplication to find out the user's ratings. But to do this, we have to already know the user attributes and the movie attributes.
It's not easy to come up with attribute ratings for each user and each movie by hand. We need a way to come up with them automatically. Let's look at the movie rating matrix that shows how all the users in our data set have rated movies so far. This matrix is very sparse, but it gives us a lot of information. For example, we know that user ID two gave five stars to movie number one. So, based on that, we can guess that this user's attributes are probably similar to the movie's attributes since they match so well. In other words, we have some clues to work with.
Let's see how we can take advantage of these clues to learn about each movie and each user. In the equation we just saw, U times M equals movie ratings, we already know some of the user's actual movie ratings. The movie rating matrix we already have is the solution to our equation. While it's part of the solution, there's still lots of holes in this array, but it's enough for us to work with. We can actually use the movie ratings we know so far to work backwards and find the U matrix and an M matrix that satisfy this equation. But here's the really cool part. When we multiply U and M back together, they will actually give us a completed matrix, and we can use that completed matrix to recommend movies.
Let's review how we are going to build a recommendation system. First, we create a matrix of all the user reviews we have in our data set. Next, we factor out a U matrix and an M matrix from the known reviews. Finally, we'll multiply the U and M matrices we found back together to get review scores for every user and every movie. But there's still one catch. Previously, when we created attributes by hand for each user and each movie, we knew what each of those attributes meant. We knew that the first attribute represented action, the second represented drama, and so on.
But when we use matrix factoring to come up with U and M, we have no idea what each value means. All we know is that each value represents some characteristic that made users feel attracted to certain movies. We don't know how to describe those characteristics in words. Because of this, U and M are called latent vectors. The word latent just means hidden. In other words, these vectors are hidden information that we found by looking at review data and working backwards.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received