Getting familiar with the product recommendation data set you are using.
- [Narrator] The user review data for our project is in a file called movie ratings data set.csv. We also have a list of movie titles in movies.csv. Both of these files are simple comma separated data that you can open in any spreadsheet program, like Microsoft Excel. But data sets are usually too large to open directly, so we'll use Python to preview the data. Let's look at view data.py. First, we use pandas read csv command to load the data set into a data table. A data table is similar to a spreadsheet.
It has columns and rows just like a spreadsheet, and you can perform many of the same operations on the data as you can with a spreadsheet. Next, we're going to grab the first 100 rows of data, then we'll use pandas to html function to convert that data into a webpage. Pandas provides lots of cool helper functions like this to make it easy to view your data. Next, we'll write out the html data to a file, and then we'll open it in our web browser using Python's built-in web browser module. This just makes it easy for us to view the data. Let's run this and look at the first 100 rows.
Right click, choose run. Each row in this data set is one movie rating entered by a single user. The first column is the ID of the user who made the rating. The second column is the ID of the movie that the user rated. And the third column is the rating that the user gave the movie. Each rating is a number from one to five, one being the worst and five being the best. We also have a second data file that lists the names of each movie, in movies.csv. Let's look at the code in view movie list.py. This code is almost identical to the code we just saw.
The only difference is we're passing in the index call parameter to pandas read csv function. This tells pandas to use the movie id field that's already in the data, as the index, instead of adding its own index column. Let's run the script and look at the movie list. Again, right click, choose run. This is a list of all the movies and their data set. Each movie has both an ID and a title. In addition, we also have genres listed for each movie. We won't actually use the genre information in the recommendations, but it's helpful to see this information to get a better idea of the kinds of movies a recommendation system is recommending.
When you start a new recommendation project, it's a great idea to take a look at the data visually, like this, to make sure you understand exactly what data that you have to work with.
Recommendation systems are a key part of almost every modern consumer website. The systems help drive customer interaction and sales by helping customers discover products and services they might not ever find themselves. The course uses the free, open source tools Python 3.5, pandas, and numpy. By the end of the course, you'll be equipped to use machine learning yourself to solve recommendation problems. What you learn can then be directly applied to your own projects.
- Building a machine learning system
- Training a machine learning system
- Refining the accuracy of the machine learning system
- Evaluating the recommendations received