From the course: Building Recommender Systems with Machine Learning and AI

Recommendations with RBMs, part 2

- [Instructor] Let's peek next at the RBMAlgorithm.py file. It's basically a wrapper around the RBM module we just looked at, which makes it easier to use and ties it into our recommendation framework. The initializer just takes the same hyperparameters the underlying RBM module needs, and stores them for when that RBM class gets instantiated. Next we have a little utility function called softmax. If you you remember from RBM.py, our GetRecommendations function returns raw results from the backward pass of the RBM in the visible layer. While you can sort of think of these as probabilities for each rating classification for each item, they aren't really probabilities. The softmax function allows us to convert them into values that all add up to one, and can be treated mathematically like probabilities. This is sort of the normal thing to do when dealing with classifications in neural networks. But as you'll see in a bit, it allows us to do another handy trick as well. The fit function is what our framework calls when it wants to train our recommendation model, and it takes in our training data. After calling the base class, we extract the number of users and number of movies in our training data, and then we use that to build up a new matrix called trainingMatrix that is in the shape our RBM class expects. It's a 3D array of users, items, and binary arrays of 10 items that represent a specific rating classification. We initialized the training data to contain all zeroes, so we can identify missing ratings later on. Again, the reason we have 10 rating types is because our ratings data is a five-star scale with half-star increments. So those 10 binary fields that represent each rating correspond to 0.5 stars, one star, 1.5 stars, et cetera. A five star rating would be represented by the values 0000000001, for example, that's nine zeroes and a one at the end, in the slot that corresponds to 5.0 stars. So now we need to populate this trainingMatrix with the training data we actually have. Again, most of this matrix will remain unfilled, and when dealing with very large-scale data, you'd probably want to investigate ways of storing this matrix in a sparse representation to save space. We go through every rating in our training set, which consists of a user ID, item ID, and actual rating score. We then convert the rating into an index into our binary array, this math just translates values that range from one to five into integers that range from zero to nine. The converted rating is then used as an index into the final dimension of our trainingMatrix, where we set a one to indicate that a particular rating score for this particular user-item pair exists. Our RBM however wants to work with 2D data, so we flatten the items and ratings out into a single dimension, with users left as the first dimension. That's what this reshape call does on line 38. Negative one is a special value in reshape, and it just means divide everything else equally throughout that dimension. Next, we're ready to create our underlying RBM object on line 41. The size of that flattened second dimension determines the number of visible nodes we need to process each individual user, and we pass along the hyperparameters we captured when initializing the RBM algorithm. Now to train our RBM, we just call the Train function, which does all the complicated work of setting up the RBM's graphs and tensor flow, and running learning on it over however many batches and epochs we specified. Next, we need to build up rating predictions for all of the missing user-movie pairs using the RBM we have trained, or specifically the weights and biases our RBM has learned from the training data. That will let us very quickly access those rating predictions when the framework calls the estimate function like a gazillion times. We start by creating the predicted ratings array, it's just a 2D array that maps user-item combinations to their predicted rating scores. We iterate through every user in our training set on line 45, and print out status updates once we've processed every 50 users. As you might recall, we wrote the GetRecommendations function in RBM.py to produce recommendations for an entire user at a time. This is what the RBM does naturally. We train it using ratings from individual users as inputs on the visible nodes, and now we are using that trained RBM by feeding it the ratings of a user we're interested in as the new inputs on its visible nodes. Our trainingMatrix is already organized the way our RBM needs it, so we just pull off the ratings for the user in question, and pass them in on line 48. The RBM then runs a forward pass using these ratings, and a backward pass where it reconstructs the missing ones using the weights and biases it learned during training. We now want to unflatten the results, so we have our binary rating category data, arranged nicely by item rating. That's all this reshape function is doing on line 49. Now, things get a little bit interesting. The raw reconstructor results in those ratings categories aren't nice zeroes and ones like we want. They are numbers that are all over the place, and we need to convert them into an actual usable rating. We could just pick the rating category with the highest value in it, and that would be a perfectly reasonable thing to do. The only problem is that this restricts you to the specific rating values that actually exists as categories. You can end up with a rating prediction of four, 4.5, or 5.0 for example, but there's no way to get a prediction of say 4.92 with that approach. We're losing some nuance into just how confident the RBM is in a given rating category by just picking the one with the biggest score, and as a result, we end up with a huge multi-way tie for movies where its best guess is a 5.0 rating. We then have to arbitrarily pick our top-end results from that tie, which doesn't work well. It's yet another example of where an algorithm designed to maximize prediction accuracy, runs into real-world trouble when you apply it to top-end recommendations. What we do instead in the loop between lines 51 and 59 is an alternative approach suggested in the paper. We first normalize the values and all the ratings categories from the reconstructed data, and since this isn't 2007, we'll use the softmax function for this, which is sort of the standard way to do that with categorical data these days. So now, we can treat this set of 10 rating categories as probabilities for each rating score. Given that, we can compute the expectation from that probability distribution. Expectation is a statistical term, and it's equivalent to a weighted average of those rating scores, weighted by their probabilities. That's what going on on line 58. We're using NumPy's average function, which lets us compute a weighted average in one line. This approach has its own problem, in that it tends to guess rather low ratings overall. But we're more interested in the top-end results than prediction accuracy here, it's the rankings that matter for us. Finally, we convert the zero to nine rating index into a one to five rating score on line 59, and store it in our predictedRatings array for use in the estimate function which is next. We want estimate to be as fast as possible, so not much is going on here. We just check that the user and item for which a rating estimate is being requested actually exists in our data, look up the predicted rating we computed and store it within the fit function, and just check that it's not some ridiculously low value that should just be discarded. And that's it. So, we're about ready to run this now, and see what happens.

Contents