Learn how to perform analysis for prediction using Python and how to interpret the results.
- [Instructor] Let's jump right in. We're going to set up our decisions tree in Python, and so I've already declared our package statements and the first cell, we'll be bringing in pandas and numpy. These are two packages that you'll experience often if you do a lot in Python. We're also bringing in pydotplus, which offers some additional functionality for graphing, and we're bringing in sklearn to help split our data and create our tree. My machine already has some of these pre-installed and you'll want to install them as well and to do so, visit the link on the screen and follow the instructions found there.
Let's take a moment and I'll show you how to install pydotplus, the other installation is a standard dmg, but if you want to install packages, like pydotplus, you can do so with command pip, install and then the name of the package itself, pydotplus and then run that, and that will, that will do the installation for you for that particular package. So, let's run this line, and next let's connect to our data source. So we'll make sure that this cell is selected and we'll go ahead and shift enter, and let's go ahead and print out our column names, just for, just for easy reference, it's always nice to have these available when we might need them, so I'm going to type in the variable for our data frame, which is, my prediction data, and then the columns command from there and then we'll run that, and this simply gives us a readout of those different columns names right there.
Now, our cross validation function loaded in at the beginning of our notebook file provides the algorithm that we need to manage our data for this example. And if you recall, I mentioned in our overview video, that a decision tree takes a set of data and splits that data continuously until it has a predicted outcome. Now further we can split our data to help fit out model, and we'll do so with what is known as testing data and training data, because this way we can run a few tests to assess whether our model holds.
So this is a good chunk of code, and I've already loaded that in there, and you can see that we've assigned our predictors to a variable called feature underscore cols. And that's where you see capita, competition, weather, var1, 2 and 3, those all being assigned. And then we assign our sales classification column to the y. So let's go ahead and shift enter, and if you'd like to use this code for your own data, you'll just want to replace the column names to line up with your own.
So shift enter. Next, we'll assign a list of different values our algorithm can use to model the different branches of our tree, and that's where you'll see the numbers two through eight, this means we can model the output of our tree to show, anywhere from two branches to eight branches, in other words, this will show up to eight possible branches to predict our outcomes. So I'll run this with a shift enter, and let's go ahead and, we're going want to specify the number of branches for our tree, so we're going to declare this with a name clf tree, and that's going to equal our decision tree classifier, and we're going to assign that a max depth of eight.
Now this is a number we can update if we want to see anything less than eight. So let's run that. And we're going to fit our training data to the x and the y, so we're going to take that value that we just assigned in terms of clf underscore tree, and we're going to fit that both to our train data for the x in our train data for the y, which we declared above, and run that.
And just run that algorithm for us right there, and some output there, and we're going to apply our test data to our model, so we're going to do that by naming this tree predict, and calling it our clf tree, which now houses the output for our algorithm, and we're going to call the function for predict, and then test underscore x, and run that. Now we can output our tree and this code leverages our graphing package to generate an image, and then shows that image in line right here in our notebook.
So we're going to go ahead and run this block of code. And if you recall from a second ago, we assigned eight branches, so that generates quite a few options, and so many options really, that it may be difficult for us to provide a clear recommendation. So let's have our algorithm narrow this down for us. If we move back to that next depth declaration, I'm going to change it from an eight to a two. Re-run that, and then re-run our data visualization.
And now that we can see that we have something that provides us with a little bit more clarity, we can see we have a tree here that looks at overall capita, and looks at our sales output, assigns that, anything less than or equal to this capita number, generates the sales classification that we're trying to drive, and then it does another branch here and looks at specifically, the next level of capita and helps us to really identify what our prediction is.
Again, this approach will allow for us to run multiple tests over time, to gain further confidence that our prediction is accurate.
In this course, discover how to gain valuable insights from large data sets using specific languages and tools. Follow Chris DallaVilla as he walks through how to use R, Python, and Tableau to perform data modeling and assess performance. As Chris dives into these concepts, he shares specific case studies that come directly from his own work with clients. Plus, he shares three essential—and practical—best practices for data-driven marketing that you can use to bolster your organization's marketing performance.
- Installing R, Python, and Tableau
- Navigating the UI for R, Python, and Tableau
- Using R, Python, and Tableau
- Exploratory analysis
- Performing regression analysis
- Performing a cluster analysis
- Performing a conjoint assessment
- Stakeholder alignment