From the course: Building Recommender Systems with Machine Learning and AI

Unlock this course with a free trial

Join today to access over 22,600 courses taught by industry experts.

Exercise solution: Outlier removal

Exercise solution: Outlier removal

- [Instructor] So to see how I went about filtering outliers in the MovieLens dataset, open up the MovieLens3.py file in the Challenges folder of your course materials. The changes are in the loadMovieLensSmall function. You can see I've re-implemented it such that it uses pandas to load up the raw ratings data, and then I use pandas to filter out those outliers. The resulting dataset for our recommender framework is then built up from the resulting pandas data frame, instead of directly from the CSV ratings file. We start by loading up the ratings data into pandas, into a data frame called ratings. We print out the start of it and the shape so we can see what the data we're starting with looks like and how much of it there is. Next we need to identify any outlier users, which first means counting up how many ratings each user has. We use the groupby command on line 34, together with the aggregate command, to build up a new data frame called ratingsByUser that maps user IDs to their…

Contents