From the course: Building Recommender Systems with Machine Learning and AI
Unlock this course with a free trial
Join today to access over 22,400 courses taught by industry experts.
Recommendations from 20 million ratings with Spark - Python Tutorial
From the course: Building Recommender Systems with Machine Learning and AI
Recommendations from 20 million ratings with Spark
- [Instructor] We promised we'd scale things up in this section, so let's see what Apache Spark can do even on a single PC. We're going to go straight from the 100,000 ratings data set we've been using so far to the 20,000,000 ratings data set from MovieLens. Let's see if Spark can handle that. MovieLens' license terms don't allow me to redistribute their data, so you'll have to head over to grouplens.org. Select the datasets page, and download the ml-20m.zip file from the 20M Dataset there. Once it's downloaded, decompress it. (Mouse clicks) And place the ml-20m folder inside your course materials folder. (Mouse clicks) Now let's go back to Spyder and open up the SparkALS-20m.py file. There are only a couple of things we changed here. First, I wrote a new load movie names function instead of relying on our MovieLens module because the movie ID's in the 20,000,000 dataset are different from the ones in the 100K dataset. This is really just copied and pasted from the code in our…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
-
-
Introduction and installation of Apache Spark5m 49s
-
Apache Spark architecture5m 13s
-
Movie recommendations with Spark, matrix factorization, and ALS6m 2s
-
Recommendations from 20 million ratings with Spark4m 57s
-
Amazon DSSTNE4m 41s
-
DSSTNE in action9m 25s
-
Scaling up DSSTNE2m 14s
-
AWS SageMaker and factorization machines4m 24s
-
SageMaker in action: Factorization machines on one million ratings, in the cloud7m 39s
-
-
-
-
-