From the course: Building Recommender Systems with Machine Learning and AI

Improving on SVD

- [Narrator] Given the success of SVD with a Netflix prize it's not surprising that a lot of research since then has focused on building a pawn SVD for other specific tasks, we're trying to improve it further. You can find recent papers on all of these variance out there. I'll call out a few that are more interesting. Factorization machines are worth a look as they are well suited to predicting ratings, or predicting clicks and recommender systems. It's the same general idea as SVD, but a bit more general purpose I'd say. It can handle sparse data without trying to shoehorn itself into the problem like SVD does. I'm calling attention to it because, Amazon Sagemaker Service in AWS offers factorization machines as a built in algorithm, so it's something that's easy to experiment with on very large data sets in the cloud. The only real downside is that it only works with categorical data so you have to work a little bit harder at preparing your data to work with it. We'll revisit factorization machines a couple times later in the course. There are also variance that are specifically for recommending series of events like the next things you're likely to watch or click given your immediate history. In neighborhood based methods, we talked about translation based recs doing this, but here in the model based section we have tools such as timeSVD++ or Factorized Personalized Markov Chains that can tackle the same sorts of problems. Probabilistic Latent Semantic Analysis or PLSA is something a team I once ran experimented with once and the early results were promising. You can use it to extract latent features from the content itself, for example, you can apply PLSA to movie titles or descriptions and match them up with users in much the same way PSA works. Content based methods like this aren't likely to do very well on their own, but combining this with models built on user behavior data could be a good idea. As we've discussed before though, once you start getting into complex algorithms often just tuning the parameters of the algorithm can produce markedly better results, and it turns out that SVD has several such parameters. This gets into what we call Hyper Parameter Tuning, it's a pretty big problem in machine learning in general. A lot of algorithms are very sensitive to parameters such as learning rates, and often different settings for this parameters makes sense for different data sets. For example with SVD, we can adjust how many latent factors we try to extract, how many dimensions we want to boil things down to. There's no right answer for that, it depends on the nature of the data you're dealing with. In the surprise lip implantation of SVD, this value was passed in the constructor of the SVD model as a parameter named n_factors, and you can set it to whatever you want. Similarly you can set your own learning rate for the SGD phase with lr_all and how many epochs or steps you want SGD to take with the n_epochs parameter. Hyper Parameter Tuning is usually a mater of just trying different values and iterating until you find the best one. Generally it makes sense to start with the default setting whatever that is and then start guessing. Try doubling it, did that make it worse? Try halfing it, did that make it better? Well then least we know it should be lower, is having too much, does 3/4 look better? Basically you just keep narrowing down until you stop seeing significant gains in the quality of your results. Fortunately surprise lip contains a grid search SV package that helps you with hyper parameter tuning. It allows you to define a grid of different parameters you want to try out different values for, and it will automatically try out every possible combination of those parameters, and tell you which one performs the best. Let's look at this code snippet. If you look at the param_grid dictionary we're setting up you'll see it maps parameter names to list of values we want to try. Remember we have to run the algorithm for every possible combination so you don't want to try too many values at once. We then set up a grid search SV with the algorithm we want to test, the dictionary of parameters we want to try out, how we're going to measure success, in this case by both RMSE and MAE, and how many folds of cross validation we want to run each time. Then we run fit on the grid search SV with our training data and it does the rest to find the combination of parameters that work best with that particular set of training data. When it's done, the best RMSE and MAE scores will be in the best_score member of the grid search SV, and the actual parameters that won will be in the best_params dictionary. You can look up the best params for whatever accuracy metric you want to focus on, RMSE or MAE. So then, we can create a new SVD model using the best parameters and do more interesting things with it.

Contents