From the course: Building Recommender Systems with Machine Learning and AI

Unlock this course with a free trial

Join today to access over 22,400 courses taught by industry experts.

Recommendations from 20 million ratings with Spark

Recommendations from 20 million ratings with Spark - Python Tutorial

From the course: Building Recommender Systems with Machine Learning and AI

Recommendations from 20 million ratings with Spark

- [Instructor] We promised we'd scale things up in this section, so let's see what Apache Spark can do even on a single PC. We're going to go straight from the 100,000 ratings data set we've been using so far to the 20,000,000 ratings data set from MovieLens. Let's see if Spark can handle that. MovieLens' license terms don't allow me to redistribute their data, so you'll have to head over to grouplens.org. Select the datasets page, and download the ml-20m.zip file from the 20M Dataset there. Once it's downloaded, decompress it. (Mouse clicks) And place the ml-20m folder inside your course materials folder. (Mouse clicks) Now let's go back to Spyder and open up the SparkALS-20m.py file. There are only a couple of things we changed here. First, I wrote a new load movie names function instead of relying on our MovieLens module because the movie ID's in the 20,000,000 dataset are different from the ones in the 100K dataset. This is really just copied and pasted from the code in our…

Contents