From the course: Spark for Machine Learning & AI
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Collaborative filtering - Apache Spark Tutorial
From the course: Spark for Machine Learning & AI
Collaborative filtering
- [Instructor] Collaborative filtering follows the same patterns we've used repeatedly in this course. First we start with preprocessing. Now, we're going to use the alternating least squares method that's provided by Spark MLlib, and, to use that, we just import the ALS code from pyspark.ml.recommendation package. And then we build a DataFrame using user-item ratings. Now, when it comes to modeling, we create an ALS object and, when we do that, we have to specify the user, the item, and the rating columns in our data frames. And then we train the model using fit and fit is part of the ALS project. And then when it's time to evaluate, we create predictions using the transform of the ALS model and we apply that to our test data. We create a RegressionEvaluator object and we use the evaluate function of that RegressionEvaluator object to calculate the root mean squared error, and that'll give us a measure of how well our collaborative filtering is making recommendations.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.